# Integration Pattern: MDM Hub-and-Spoke with Operational Writeback and Analytical Replication ## Pattern Name **MDM Hub-and-Spoke with Operational Writeback and Analytical Replication** _Also known as: Consolidation-Coexistence MDM Pattern, Golden Record Publish Pattern_ --- ## Intent Ingest entity records from multiple operational source systems into a central MDM hub, produce a reconciled golden record through match/merge and survivorship, then publish that master record back to operational consumers via an integration broker and simultaneously replicate it to an analytical data warehouse — establishing a single authoritative source of truth for key business entities across the enterprise. --- ## Motivation In enterprises with multiple operational systems (Billing, CRM, ERP, etc.), the same real-world entity — a customer, organization, or product — exists as independent, often inconsistent records in each system. No single system owns the complete, trusted picture. This pattern centralizes identity resolution and data quality in a dedicated MDM platform, then propagates the authoritative record outward so that every consumer — operational or analytical — works from the same master. --- ## Applicability Use this pattern when: - The same entity (customer, organization, product) exists in multiple source systems with overlapping but inconsistent representations - Downstream operational systems need to consume a trusted, deduplicated master record - Analytics requires a reliable master entity layer for cross-system joins and reporting - The organization has distinct integration needs for operational writeback (low-latency, API-driven) vs. analytical replication (batch, bulk) - Data stewardship workflows are required to manage match exceptions and override survivorship decisions --- ## Participants |Participant|Role|Technology (Example Implementation)| |---|---|---| |Source Systems|Contribute raw entity records to MDM hub|Billing system, CRM (e.g., Salesforce)| |Inbound Integration Layer|Extract and stage source records into MDM|Informatica PowerCenter / IDMC pipelines| |MDM Hub|Match, merge, survivorship, golden record production|Informatica MDM Hub (deployed as Enteworks)| |Message Broker|Decouples golden record publish events from downstream consumers|RabbitMQ (hosted on Enterworks platform)| |Outbound Integration — Operational|Subscribes to golden record events, maps fields, writes back to source systems|Dell Boomi (iPaaS / ESB)| |Outbound Replication — Analytical|Replicate golden record tables to warehouse|Fivetran| |Analytical Data Warehouse|Consume and join on master entity records|Snowflake| |Data Stewardship UI|Human review of match exceptions and merges|Informatica MDM Data Director (or equivalent)| --- ## Structure ``` Billing System CRM System │ │ │ Batch feed / │ Batch feed / │ API extract │ API extract ▼ ▼ Informatica Pipelines (Inbound) │ │ Staged contributor records ▼ ┌─────────────────────────────────────┐ │ MDM Hub (Enteworks) │ │ │ │ Source Records (per system) │ │ │ │ │ ▼ │ │ Match Engine │ │ (probabilistic / deterministic) │ │ │ │ │ ▼ │ │ Merge & Survivorship │ │ (attribute-level source priority) │ │ │ │ │ ▼ │ │ Golden Record (BVT) │ └───────────────┬─────────────────────┘ │ ┌───────┴──────────────┐ │ │ ▼ ▼ RabbitMQ Fivetran (on Enterworks) (Analytical Exchange / Queue Replication) │ │ │ Event subscribe ▼ ▼ Snowflake Dell Boomi (Master Entity (Operational Layer) Writeback) │ ┌────┴────┐ ▼ ▼ Billing CRM System System ``` --- ## Flow ### Inbound — Source Contribution 1. **Extract** — Informatica pipelines pull entity records from Billing and CRM on a scheduled basis (or event-triggered for near-real-time coexistence) 2. **Stage** — Records land in MDM Hub staging tables as **source contributor records**, tagged by source system and load timestamp; raw source values preserved 3. **Tokenize / pre-process** — Standardization routines applied (name parsing, address normalization, phone formatting) to improve match accuracy ### Match & Merge 4. **Match** — MDM Hub runs configured match rules (deterministic on key identifiers, probabilistic on name/address/contact attributes) to identify candidate duplicate pairs across and within source systems 5. **Review** — High-confidence matches auto-merge; low-confidence pairs routed to data stewardship queue for human review via MDM Data Director UI 6. **Merge** — Matched records consolidated into a single **golden record (Best Version of Truth)** 7. **Survivorship** — Attribute-level survivorship rules determine which source system's value populates each golden record field (e.g., CRM wins on contact name, Billing wins on account number and payment terms) ### Outbound — Operational Writeback (RabbitMQ → Boomi) 8. **Publish event** — MDM Hub publishes a golden record create/update event to a RabbitMQ exchange hosted on the Enterworks platform; MDM fires and forgets, with no direct coupling to downstream consumers 9. **Queue routing** — RabbitMQ routes the event to the appropriate queue(s); exchange/binding configuration determines which queues receive which event types (e.g., customer create vs. update vs. merge) 10. **Boomi consumption** — Dell Boomi listens as a queue subscriber, consuming events as they arrive; applies per-target field mapping and transformation, and writes updated master records back to Billing and CRM via their respective APIs or file interfaces 11. **Cross-reference sync** — MDM Hub's cross-reference table (source key ↔ golden record ID) updated to maintain bidirectional traceability ### Outbound — Analytical Replication (Fivetran) 11. **Fivetran sync** — Fivetran replicates MDM Hub golden record tables to Snowflake on a scheduled interval (typically daily or near-real-time depending on connector config) 12. **Snowflake master layer** — Golden records land in a dedicated master entity schema in Snowflake, serving as a trusted join key for cross-system analytical queries (e.g., unifying Billing revenue with CRM activity at the customer level) --- ## Tradeoffs ### Advantages - **Event-driven decoupling** — RabbitMQ between MDM and Boomi means MDM fires and forgets; Boomi consumers can be added, removed, or restarted without touching MDM publish logic - **Single source of truth** — All consumers — operational and analytical — work from the same reconciled master entity - **Decoupled identity resolution** — Match/merge logic lives in MDM, not scattered across individual application integrations - **Bidirectional traceability** — Cross-reference table preserves the link between golden record and every contributing source record - **Separation of integration concerns** — Boomi handles operational writeback (transformation, routing, API complexity); Fivetran handles bulk analytical replication — each tool doing what it's optimized for - **Stewardship workflow** — Human oversight of ambiguous matches prevents silent data quality degradation ### Limitations - **Operational complexity** — Four integration layers (Informatica inbound, RabbitMQ broker, Boomi outbound, Fivetran replication) means four failure surfaces to monitor and maintain - **Latency in coexistence** — Source systems receive updated master records on a delay; real-time coexistence requires additional architectural investment (event streaming, API-first MDM) - **Survivorship rigidity** — Static attribute-priority rules can produce counterintuitive golden records when source system data quality shifts; rules require ongoing tuning - **Match rule fragility** — Probabilistic matching requires calibration; poorly tuned rules generate false merges (undoing them — "unmerge" — is costly) or excessive stewardship queue volume - **Snowflake lag** — Fivetran replication cadence means analytics golden records may trail operational MDM state; not suitable for real-time analytical use cases - **Hub dependency** — All cross-system entity resolution flows through MDM; hub unavailability blocks inbound updates and outbound publishes --- ## MDM Style Classification This implementation combines two MDM styles: |Style|Description|Applied Here| |---|---|---| |**Consolidation**|Source systems remain authoritative; MDM produces a read-only golden record for analytics|Fivetran → Snowflake path| |**Coexistence**|MDM golden record is published back to source systems, which may adopt it selectively|Boomi → Billing / CRM path| A full **Registry** or **Transactional** MDM style would require source systems to route all creates/updates through MDM first — a higher maturity commitment not present in this pattern. --- ## Related Patterns - **Batch API-to-Warehouse ELT** — Simpler unidirectional pattern without identity resolution; often feeds raw source data that MDM then reconciles - **Event-Driven MDM Publish** — Streaming alternative to batch Boomi writeback, using Kafka or similar for lower-latency golden record propagation - **Canonical Data Model Integration** — Complementary pattern; defines the shared entity schema that MDM golden records and Boomi mappings conform to - **Reverse ETL** — Related concept for pushing warehouse-derived insights back to operational systems; distinct from MDM writeback in that it carries analytical outputs, not master records --- ## Implementation Notes - Maintain a **cross-reference table** (source system + source key ↔ golden record ID) as a first-class artifact — it is the connective tissue of the entire pattern - Version golden records with an effective date range to support slowly changing dimension (SCD) semantics in Snowflake - Instrument Boomi flows with dead-letter queues and alerting — silent writeback failures will cause source systems to drift from MDM truth without anyone noticing - Monitor RabbitMQ queue depth as an operational health signal — a growing queue means Boomi consumers are falling behind or have stopped processing - Design RabbitMQ exchange/binding topology to support multiple independent consumers per event type — this allows future consumers (additional target systems) to be added without modifying MDM publish logic - Design survivorship rules explicitly per attribute, per source — document them; they encode business decisions that will otherwise be rediscovered painfully during audits - Establish a **data stewardship SLA** for the match review queue — an unworked queue is a backlog of unresolved duplicates accumulating in production - In Snowflake, expose golden records as views that join the master entity table with the cross-reference table — downstream analysts can then trace any golden record back to its contributing source keys