There are essentially five types of data in corporations:
Unstructured data is a generic label for describing any corporate information that is not in a database. Unstructured data can be textual or non-textual. Textual unstructured data is generated in media like email messages, PowerPoint presentations, Word documents, collaboration software and instant messages. Non-textual unstructured data is generated in media like JPEG images, MP3 audio files and Flash video files.
Transactional data, in the context of data management, is the information recorded from transactions. A transaction, in this context, is a sequence of information exchange and related work (such as database updating) that is treated as a unit for the purposes of satisfying a request. Transactional data can be financial, logistical or work-related, involving everything from a purchase order to shipping status to employee hours worked to insurance costs and claims.
Metadata describes other data. It provides information about a certain item's content. For example, an image may include metadata that describes how large the picture is, the color depth, the image resolution, when the image was created, and other data. A text document's metadata may contain information about how long the document is, who the author is, when the document was written, and a short summary of the document.
Hierarchical data is data that is grouped into a tree-like structure, with repeating parent/child relationships. It stores the relationships between other data. It may be stored as part of an accounting system or separately as descriptions of real-world relationships, such as company organizational structures or product lines. Hierarchical data is sometimes considered a super MDM domain, because it is critical to understanding and sometimes discovering the relationships between master data.
Master Data is common data about customers, suppliers, partners, products, materials, accounts and other critical “entities,” that is commonly stored and replicated across IT systems. Master Data is the high-value, core information used to support critical business processes across the enterprise.
Most businesses rely on key pieces of data for their operational and reporting needs. This data represents the primary objects involved in most of the business transactions and planning that these business do – things like customers, products, suppliers. These data objects might be different across vertical markets, such as (Patient, Provider, Hospitals in healthcare), (Customers, Locations, Financial Issues in financial services). But most businesses rely on a few very key objects to run their business and plan for the future.
The challenge is that the data describing these key objects are often stored across many different databases and systems throughout an enterprise. And since the identities of these objects are scattered across different systems and maintained differently there is often a large degree of variation in the data that described the object. So if, for example, customers in a business are known differently across the organization, its difficult to get a good clear view of the customer and how they interact with the business. Who are they, what do they buy? How often? What is the best way to market to this person and increase sales? What does his/her household look like? Are members of the household also customers?
There is a wealth of information locked away in the data that is currently scattered across the disparate systems in the organization. Informatica MDM helps to unlock the potential of this data by aggregating, consolidating and rationalizing it into master data that the organization can leverage to solve a whole host of business problems and improve operational efficiency.
lets look at how data flows in and out of the MDM solution. Starting at the top right – this explains how data flows into the Hub and becomes the single source of truth in BATCH MODE.
The landing process is where data enters the Informatica Hub from external sources that contain information (e.g. call center databases, billing applications, etc.). Data can be inserted into landing tables using any type of data movement tool (e.g. ETL, EAI, web services, etc.)
In the Staging process, data moves from the landing tables to staging tables after operations like delta detection, data cleansing, transformation and reject management.
Next, the load process applies trust and validation rules to the data in a staging table and loads the resulting data into a table in the target data model. The load process updates existing records in the target data model and inserts any new records provided by the source system.
Moving up in the Target Data Model part of the diagram, the Match Process applies a set of user-defined match rules that results in merge (or link) candidates.
Merge candidates are consolidated either automatically or with user input using Informatica Data Director.
One of the most powerful capabilities of the Informatica Hub is the way it dynamically computes at merge time (and at update time) the best information – at cell-level from multiple sources and ensures survivorship of this information over time.
When two matched records are merged to create a consolidated record only one of the two values from the source records survive in each corresponding cell of the consolidated record. To ensure the most reliable information in every cell of the consolidated record, Informatica Trust Framework™ uses a concept of “Trust” that can be assigned for each column of the source systems.
The lower right section of the slide explains how data flows into the Hub in REAL TIME.
For the most part, the REAL TIME data flow process and the BATCH data flow process are similar with a few exceptions:
Finally, Once these consolidated records are processed by the Hub, they are considered “master records” . These master records can then be consumed by analytical systems such as Data Warehouses or as a data source for operational systems (applications). The master records are sent to these different systems via web services, using message queues, or even using ETL tools.