This article introduces the concept of Semantic Enterprise and outlines connection between Semantic Enterprise and Master Data Management (MDM) concepts. The article also shows that successful transitioning to Semantic Enterprise requires significant improvements in enterprise metadata and especially in business metadata management. It explains the importance of supporting an enterprise-level semantic continuum from both business and information technology (IT) communities by committing to development of Enterprise Architecture tenants that would bring both communities together to a more synergetic environment.
Once broadly realized, the critical and indispensable nature of the relationship between Business, Information and Technology architectures will generate demand for improvements to modeling tools that vendors will have to meet in order to remain relevant.
Master Data Management and Metadata
Currently, most tool vendors define Master Data Management as the capability to create and maintain a single, authoritative physical source of “master” data. The purpose of this source is to make shared data – data that has a single content and format – available to all the enterprise systems that need to reference it. As such, Master Data is typically called Reference Data.
While MDM, by this definition, is an important technical pursuit in its own right, there is a larger phenomenon behind it.
The broader issue is the semantic integrity (or rather the lack of it) of shared data, particularly at the enterprise-level. As Vickie Farrell, the former VP of marketing for Cerebra puts it: “…lack of what Gartner calls "semantic reconciliation" among data from different sources is inherent in a diverse, dynamic and autonomous organization. … Resolving discrepancies in metadata descriptions from multiple tools, not to mention cultural and historical differences, involves more than physically consolidating metadata into a common repository.”In other words, it is always possible, and arguably, quite easy, to misinterpret any shared data in the absence of rich contextual information that unambiguously distinguishes between different possible meanings. A substantial portion of this rich metadata context should come from information about the business processes that generate and use the shared data. While this metadata continuum starts in the business function model layer (more on architectural layering later in the “Three-Layered Architectural Model” section), it should support a consistent interpretation of shared metadata that continues through the complete business-IT space, all the way through to the implementation and maintenance of the deployed applications and services.
 Some authors differentiate between reference and master data. See “Master Data versus Reference Data” by Malcolm Chisholm, Published in DM Review Exclusive Online Content in April 2006.
 “The Need for Active Metadata Integration: The Hard-Boiled Truth”, Vickie Farrell;
DM Direct, September 2005; http://www.dmreview.com/dmdirect/20050909/1036703-1.html
Consider for example a scenario, where Marketing, Sales, and Customer Service departments all use the enterprise Current Customers set. In order for any enterprise to produce reconcilable financial and managerial reports, it is imperative that when systems from different departments access the same data from a single source, their interpretation of what constitutes “current customers” should also be identical. In the case of historically different definitions imbedded in legacy systems, each department should be aware of exactly which particular definition has been used for the enterprise master data and how to correlate that definition with its own departmental definition of “current customer”.
The following architectural model helps to minimize probability of errors similar to the one described above.
Three-layered architectural model
A simple Enterprise Architecture model that supports the desired metadata layering is shown in Figure 1. This model links all three layers: Business Function, System Specification, and Physical / Implementation, into one Enterprise Metadata continuum, in order to guarantee information integrity for the whole enterprise.
All constituents that populate Enterprise Architecture models can be grouped into four architectural domains: Business, Information, Application, and Infrastructure. Notice that according to the proposed partitioning, Data Architecture does not constitute its own domain but is actually a sub-domain of Information Architecture.
Management of metadata information that describes the physical (infrastructure) layer is a challenging problem in its own right. However, this topic is well outside of this article’s scope and is extensively covered by the ITIL and numerous other publications on Configuration Management Database (CMDB). For those interested in a detailed discussion of the enterprise infrastructure metadata topic, please see Charles Betz’s “Architecture and Patterns for IT Service Management, Resource Planning, and Governance: Making Shoes for the Cobbler's Children”.
For the purpose of this discussion, we will concentrate on the business function layer. A business function is a particular proficiency that an organization — typically an enterprise — possesses and operates to achieve a specific business goal. A business function is an abstraction of a business process that preserves the main characteristics of what is being delivered by the process, discarding most of the information about how it is done, and thus representing an externally visible view of the business process.
The foundational nature of the business-function layer definitions need to be emphasized. Unless all applications that reference master data elements agree on the exact meaning of all shared information structures, within the appropriate business process context (e.g. “customer”, “current customer”, “returning customer”, “high-value customer”, etc.), there is very little practical value in creating shared physical data stores.
Unfortunately, the generation of business-function layer metadata, its maintenance and integration with metadata from other layers, presents a number of issues discussed below.
Business Function Layer and its Metadata
Despite ongoing discussion about the need to add more explicit business function-level, technology-independent contextual information to the metadata repositories, there are currently only very few tools that offers such capabilities (I’d be happy to learn that I am wrong here). One of the reasons that integration of business function-level metadata across all architectural layers is so difficult to achieve is that the standards for the metadata representation in each of the model layers are still emerging, especially in the top-most business function layer.
Recent progress by modeling tools vendors and the user community around the BPEL, XMI, CWM, and other standards positioned to answer the need for exchanging and storing metadata is very promising.
However, the integration and unification of different types of modeling tools, as it was predicted in 2005 by Michael J. Blechar of Gartner, is far from being complete. “Before disparate modeling tools from the same or different vendors can truly become integrated, standards such as UML and BPMN must evolve and be coordinated. Gartner does not expect this effort to be completed — or, at least, approach a "good enough" solution to integrate best-of breed tools in any meaningful way — until 2007 or 2008.” While integration of modeling tools is definitely a serious problem, the main problem faced by most companies is the absence of realization that a common understanding of information between the different constituencies of the modern enterprise, both external and internal, is needed. The common semantic enterprise space should extend in both directions: horizontal and vertical. The horizontal dimension addresses the interactions between different departments of the same company as well as between a particular business entity and its external environment (i.e. business partners, suppliers, legal and regulatory compliance mandates, consumers, etc). The vertical dimension addresses the information interchange between business users looking for productivity improvements and cost reductions on the one hand, and the IT community responsible for implementation of the required automated information systems on the other. Due to the extremely complex nature of the modern IT environments and the specialization that is caused by this complexity, further specialization in skill sets and fragmentation in the IT semantic space needs to be addressed as well. One possible solution here is the creation of an organizational role at each level (business function, logical specification, and physical implementation) that is responsible for each layer’s, as well as interlayer semantic integrity. For example, business process models created by the business process architects and the information models created by information architects should have referential integrity. For this wonderful, if not miraculous, event to finally take place, these models need to share a common semantic space (i.e. every information element and information structure that exists in the information models should be referenced in the process models and have identical meaning in all of them). The corollary is also true: no two different information elements can have the same meaning in all business process models within the scope of a single business unit domain. In addition to that, the existence of information elements (in information models) that are not used in any of the business process models is, in general, disallowed.
Semantic Enterprise and Semantic Web
The concept of Enterprise semantic space (or Semantic Enterprise) is closely related to that of Semantic Web. While there are significant similarities between the two, there are also some significant differences.
The main similarity between the two concepts is a notion of well-defined information meaning (or semantic), which ensures that complex information management processes are successfully executed in a predictable manner. Both concepts require well structured information models (a.k.a. as ontologies) as well as tools that process these rich information models in order to establish and maintain common understanding of information between the different parties involved in exchanging information. At the implementation level, it is safe to assume that UML- and XML-based technologies (MOF, XMI, RDF, OWL, etc) would play the central role in the development of both concepts.
At the same time, the main difference is rooted in the degree of centralization, or rather de-centralization. While the creators of the Semantic web presume a highly decentralized model with a relatively high degree of inconsistency, the Semantic Enterprise is significantly more sensitive to semantic inconsistency and thus would require a higher degree of centralization and control.
While detailed discussion of the Semantic Web is beyond the scope of this article, it is important to understand that these two concepts have significant overlap and would probably develop a common set of tools and approaches.
Unifying Data- and Process- centric Views on MetadataFrom the enterprise metadata point of view, MDM, in a broad sense, is a sign of things to come. It highlights the existing need for a new approach to the integration of business function-level metadata at the Enterprise level or Enterprise Business Metadata Integration (EBMI). The term EBMI describes a consistently coordinated (unified) view of the business function-level information at the Enterprise level. It implies neither a single physical format used by all systems, nor a virtual database that provides access to the information, regardless of its physical location (similar to Enterprise Information Integration technique). In actuality, EBMI is primarily concerned with conceptual and logical level information, rather than information at the physical implementation level. EBMI implies that there is an agreed upon (by all business and IT constituencies)
definition for certain information structures, each within a particular well-defined business function context, as well as a robust translation mechanism between the different formats of metadata used by multiple business and technology counterparts. Any physical implementation that is compliant with the definition above is naturally a part of the solution.
For example, two departments of the same company, Order Management and Fulfillment on one side, and Customer Service on the other, have historically had different definitions of what constitutes a “returning customer”, a “fulfilled order”, an “inventory level trigger”, etc. The company, as a whole, may or may not have implemented a single physical data store for this information. While it is highly desirable to factor out common enterprise information and store it in one central location to minimize redundancy, it is not critical. It is however absolutely critical for the long-term health of the enterprise to make sure that both departments are aware that they have differences in the customer and order semantics, which are rooted in implementations of the departmental business processes. Another critical component of the solution is a set of rules that correlates (translates) two departmental definitions. The EBMI concept is similar to the business modeling approach provided by Business Process Modeling (BPM) tool vendors since it presents rich contextual information about business functions that a company possesses in order to meet its business objectives. What makes this concept different from the traditional BPM approach is that it adds metadata, which describes the information (data) elements participating in the business processes. By providing a robust information architecture model that complements the business process modeling view, EBMI brings together the BPM approach and the data-centric approach (traditionally used by the data modeling and database programmers’ community).
The examples and issues presented above demonstrate that the effort to establish a semantic enterprise continuum should be driven by both the business and the IT communities. Cooperation of the two communities, and especially the sponsorship of the business leaders, are of primary importance. However, the goal of building Semantic Enterprise cannot be accomplished without the necessary tools even with the two communities working in concert. This is especially true for the interlayer integration – while tool vendors have begun to provide a better support for physical layer metadata integration, support is almost non-existent for cross layer interactions. Vendors that provide not only “horizontal” integration capabilities across physical sources (physical sources data lineage), but also “vertical” integration capabilities between physical data and business process contextual information, will emerge victorious in the critical integration race.
 Web Services Business Process Execution Language , OASIS Standard WS-BPEL 2.0.
 XML Metadata Interchange standard, Object Management Group Standard; http://www.omg.org/
 Common Warehouse Metamodel -- specification for modeling metadata for a data warehousing environment, Object Management Group Standard; http://www.omg.org/
 Michael J. Blechar, Gartner Research Publication G00129905, “BPA, Object-Oriented and Data Modeling Tools Are Converging.” 2005
 For the detail discussion of the three-layered Enterprise Architecture model please see: “Quality Data Through Enterprise Information Architecture”, http://msdn2.microsoft.com/en-us/library/bb266338.aspx
 For a more detailed discussion of the four architectural domains, please see the Forrester Research Report “Creating the Information Architecture Function” http://www.forrester.com/Research/Document/0,7211,34649,00.html
 Charles T. Betz , “Architecture and Patterns for IT Service Management, Resource Planning, and Governance: Making Shoes for the Cobbler's Children”, Morgan Kaufmann , 2006, ISBN-10: 0123705932
 I would prefer to use the term Enterprise Information Integration (EII) but unfortunately this term is already used to describe a virtual integrated data view centered on the physical implementation layer.