BASICS: The Information Supply Chain > Content Factory (Data Warehouse)
The Data Warehouse represents the second major node in the information supply chain. The DW, in conjunction with various satellite systems, has been likened before to a factory, most notably by Bill Inmon, one of the field’s thought leaders, who wrote a book entitled The Corporate Information Factory. (Perhaps simultaneously using the terms “warehouse” and “factory” seems a little schizophrenic – pick the best metaphor and stick with it, right? Nevertheless, the term “warehouse” enjoys broader use in the marketplace, even if the factory analogy is perhaps richer and more descriptive of what data warehousing entails.)
The goal of the factory in the information supply chain is the transformation of raw operational data into enterprise information content. The factory takes in transactional data from the company’s various operational systems, subjects that data to a complicated series of processes wherein the data is cleansed, refined, and integrated, and finally spits out a finished good – published content – on the other end. Along the way, the DW interacts with various satellite tools and systems that read data from the DW, or write it back into the DW, and often times both, so as to facilitate its processing. These satellite systems help structure the data, score it, stratify it, order it, stage it, or otherwise enrich it somehow beyond what can be done simply with the raw goods available from the operational systems, all by themselves. Some examples of DW satellite systems are depicted above, and include:
• CRM (customer relationship management) tools which can enrich the DW with valuable data about customer interactions
• MDM (master data management) tools that better order and track master data about customers, products, and suppliers than do typical operational systems
• OLAP (online analytical processing) tools that stage, summarize, and present data to end users in a multi-dimensional format that is much speedier to analyze than what is possible with a standard relational database
• Allocation Tools which attempt to apportion corporate expenses to operational data so as to better understand corporate performance
• Data mining tools which sort thru large sets of data and identify statistical patterns, so as to score, stratify, or segment it somehow
Some of the distinguishing characteristics of the content factory are that it operates on large volumes of data, via complicated processing that typically takes a long time to run (minutes to hours to even days), and in batch, rather than online mode, meaning these complicated processes don’t involve a lot of human interaction, typically. (Not a lot of interaction in running them, anyway – it takes a lot of human effort to build them!) The factory typically runs on big database servers that are properly equipped to handle the data volume and processing complexity required.
Finally, the end goal of this factory, its purpose for existence, is the production of useful finished goods – corporate information content.