In-Depth

Getting it right with meta data

Corporate executives realize that knowledge is what differentiates corporations in the information age. Meta data—data about data—is all about knowledge, and capturing it and making it accessible throughout an organization is becoming vital for success. With meta data and a meta data repository, corporations can mature from the "crawling" stage to the "walking" stage in their information technology development process. For some corporations, building a meta data repository is no longer an option—it is an absolute requirement.

Experts say that many large organizations are finding that a top-notch meta data repository can provide a significant Return On Investment (ROI), and become a competitive differentiator in the marketplace.

To sell the concept of a meta data repository to senior management, corporate development managers should emphasize two probable results—increasing revenues and decreasing expenses, some experts say. This article will illustrate how a repository can provide value to a company's IT department through technical meta data, and to business users via business meta data.

Table 1 lists the value that impact analysis (both data warehouse and enterprise-wide) can provide to a corporation.

Table 1: Impact analysis benefits
Business/Technical Value ROI Measures
Reduction of IT-related problems IT staff is much less likely to make programming errors when making system enhancements, because all impacted programs, tables/files and fields are identified.
Reduce IT development life cycles and costs IT development life cycles are greatly reduced, because all impacted programs, tables/files and fields are identified.
Reduce redundant data Impact analysis allows an IT department to identify redundant data in their systems. In addition, this functionality greatly reduces the likelihood of building systems containing redundant data.
Reduce redundant processes Identifying redundant processes in systems greatly reduces the likelihood of building redundant processes.
Reduce impact of employee turnover Documenting the knowledge currently known only to the developer who built the programs makes it available to the entire IT staff.
Improved system performance As redundant data and processes are removed, the performance of the system is vastly improved.
Source: David Marco

In addition to improving system performance, impact analysis can reduce errors, cut life cycles and costs, eliminate redundant data and processes, and lessen the effects of employee turnover.

Data warehouse impact analysis
For years, stories about corporations that have implemented less than successful or failed data warehousing implementations have been widespread at conferences and in both the trade and general press. The chief factors that caused many of these project failures were found to be poor data warehouse architecture and data quality issues.

Another factor in most of the cases, some experts say, is the absence of a meta data repository. A meta data repository can significantly reduce the costs of business intelligence and data warehouse systems development, as well as the time to market for new and modified data warehouse systems. Meta data can accomplish these two goals through the use of technical impact analysis reports. These impact analysis reports can significantly aid data warehouse developers in examining the impact of proposed changes to a data warehouse system environment. Such functionality is critical to any company for the long-term management of data warehouse systems.

Data warehousing systems collect data from the operational systems of a business. The operational systems frequently undergo changes to the business rules and the data structures that can directly impact the data warehouse systems they feed. Impact analysis reports can help the systems overcome this challenge.

For example, consider a plan to modify the table used to store customer data in the order entry system of a corporation. The meta data in the meta data repository could be used to run an impact analysis showing all of the data warehouse tables and files, programs and fields that could be impacted by such a change. The "Table Type" field on the report could equal one of three values: S, I or T. S indicates that the table is a source table/file from the operational system to the data warehouse system. I signifies that the table is an intermediate table/file between the operational system and the data warehouse system. And T indicates that the table/file is the target data warehouse table.

The data warehouse development team could then use this data to gauge the potential impact of the operational system changes to the full data warehouse system. Such information could reduce the time required for the data warehouse team to manually analyze the impact of the changes, thereby reducing the development time needed to modify the data warehouse system. In addition, the likelihood of development errors is significantly reduced as all impacted programs are identified.

For the best results, the data warehouse team will need the option to limit the amount of information on the impact analysis; therefore, they will need to be able to perform record selection on the following report attributes: source system, source system table, source system field, data warehouse system table, data warehouse system field and table type.

Enterprise-wide impact analysis
Enterprise-wide impact analysis can expand the scope of data warehouse impact analysis to include all of a company's IT systems. Traditionally, such studies have been kept separate because it is much easier for a corporation to build a meta data repository that stores meta data on the data warehouse system. On the data warehouse side, developers are working with relatively new systems based on advanced design and technology, especially compared to the older operational systems. However, experts say, meta data is every bit as important to older systems as it is to new technically advanced ones.

Understanding the system impact of a major IT change requires a careful analysis of the current operational and data warehouse systems. A meta data repository can significantly cut the cost and length of development by capturing the data transformation rules, data sources, data structures and the context of the data in the IT systems. This capture of data transformation records becomes critical with the new technology because without a meta data repository the transformation rules can only be stored in the memory of staff members. The meta data can significantly aid analysts in examining the impact of proposed changes in the system's environment. This benefit can reduce the costs of future releases and help to reduce the propensity of new development errors.

Consider, for example, a company that needs to expand the field length of customer numbers from a 20-byte alphanumeric value to a 30-byte alphanumeric value through all of its systems. In this case, an enterprise-wide impact analysis report that show systems, tables/files, fields and their domains that would be impacted by a change to the length of all occurrences of the customer number field would be appropriate. The report would clearly show which systems and fields cannot handle a 30-byte alphanumeric value.

Reports like this one can be more technical in nature because they will be used by the IT staff that supports the corporation's IT systems. The technical team will need to be able to limit the amount of information on the impact analysis. They will need to have record selection on the following report attributes: system, system table, system field, and table type.

Meta data-driven business user interface
The reason IT professionals exist is to meet the informational needs of business users. Unfortunately many systems installed today are falling well short of meeting the needs of the business. One cause: instead of designing systems that can speak to business users in familiar terms, many organizations are building systems that can only communicate in terms that are familiar only to IT personnel. Meta data holds the key to resolving the challenge of making users comfortable with technology. Meta data can address the situation because it provides a semantic layer between IT systems and business users. In simple terms, meta data looks to translate the systems' technical terminology into business terms with which the business users are familiar.

Web-enabled data warehouse systems can be designed with Web front ends that business users can use easily. Business users of these systems do not care whether the information comes from a data warehouse, data marts, an operational data store or a meta data repository. Users just want to find information they can understand quickly.

For example, many business users need to regularly access monthly product sales data. Users in companies using meta data repositories could go to a data warehouse Web site and then search for this flavor of report. Once the business user gets to the search page of the data warehouse site, the user can search for any data warehouse reports that have "monthly product sales."

Table 2: Business meta data benefits
Business/Technical Value ROI Measures
Reduction of IT-related problems The business users have a much easier time using the system; therefore, IT-related problems are reduced.
Increase system value to the business The system has a great deal more meaning to the business users, which allows them to perform their functions more thoroughly.
Improved business decision making Business users of the system will be able to locate the information they need, and they will be able to understand the data presented.
Source: David Marco

Meta data can provide a semantic layer between IT systems and business users—essentially translating the systems' technical terminology into business terms—making the system easier to use and understand, and helping users make sound business decisions based on the data.

At this point, meta data comes into play. The meta data in the meta data repository has business definitions for each data warehouse report. Thus, the user can search through meta data business report definitions for the reports that have the words "monthly product sales." The user can select from the list of reports or enter a new query. By integrating this business meta data into the data warehouse report, the business user can get answers to most any pertinent query. The inclusion of the meta data makes the data in the data warehouse system much more valuable, so it is likely to improve the accuracy of decision making.

Meta data has taken the data warehouse system in this example and vastly improved its value to the business user by utilizing meta data-driven access. In addition, the value of the actual information in the data warehouse reports is vastly upgraded through the use of business meta data.

Data quality tracking
Data quality is a significant issue impacting many, if not all, corporations competing in today's marketplace. For most corporations, IT systems are a strategic weapon that can provide a significant competitive advantage. Conversely, if the data in these systems is redundant, inaccurate, missing or incomplete, the corporation is placed at a severe and distinct disadvantage. In addition, many companies are undertaking mission-critical initiatives like e-business, customer relationship management and data warehousing. Each of these initiatives typically requires data from a company's legacy systems. If the quality of the data in these systems is poor, it can impact the reliability, accuracy and effectiveness of the initiatives. The old IT saying "garbage in, garbage out" illustrates that data quality, or the lack thereof, is critical to any enterprise.

Table 3: Data quality tracking benefits
Business/Technical Value ROI Measures
Improved business decision making Data quality is improved, which provides the business users more accurate systems and reports.
Reduction of IT-related problems Improved data quality reduces many system-related problems and IT costs.
Increase system value to the business Business users of data warehouse systems can make better decisions if they are aware of possible errors skewing report numbers.
Improved system performance As data quality improves, system errors are reduced, which improves system performance.
Source: David Marco

Data quality is essential for most business initiatives. Data quality tracking can create a significant ROI by improving business decision making, reducing system-related errors and IT costs, and improving data accuracy and system performance.

Meta data is a critical component to any data quality initiative. Meta data can provide a mechanism for monitoring and improving the quality of the data coming from the operational systems into a data warehouse system. It can track the number of errors that occur during each data warehouse/data mart load run, and can report to the IT staff when certain error thresholds are exceeded. For example, if transactional sales records are loaded into a data warehouse system for the marketing department to view, users may decide that if more than 2% (the threshold) of the dollar amount of all of the sales transactions is in error, they need to stop the data warehouse system load processes and investigate the problem.

In addition, the data quality metrics of all data warehouse systems should be stored in a meta data repository that is kept throughout the life of the data warehouse system. Storing the historical data can help corporations monitor data quality and determine whether it is improving over time.

In data warehouse systems, it is common to compare field values from different time periods. For example, a consumer electronics manufacturer might want to generate a data warehouse report showing global corporate sales on a monthly basis. A business user could use this report to compare U.S. sales from October 1999 to November 1999 for the holiday buying season. As users compare these numbers, they may feel that the sales amount for November seems to be a little low. They could check the data quality statistics and see what percentage of the records in the November data warehouse load run erred out and were not loaded. This would let them know their margin for error when making decisions based on the report.

What happens when data quality is skipped?
Unfortunately, many companies for various reasons decide against spending the necessary time or money for the technology and people to uncover, evaluate and resolve their data quality issues.

One Enterprise Warehousing Solutions (EWS) client, a very large insurance firm that operated internationally, had multiple data warehousing projects going on simultaneously. The consulting firm's initial proposal allocated time and resources to conduct a study to gauge the quality of the data in the source system of the warehouse during the feasibility phase of the data warehouse initiative.

Ultimately, the firm's decision not to undergo data quality procedures caused the entire project to be stopped when the development team determined that the poor quality of the data in the source system would lead to inaccurate business reports. The result: the project was stopped after the client spent approximately $225,000 in consulting fees and employee salaries on the cancelled project.

Experts say meta data solutions used today represent merely the tip of the iceberg as far as what IT organizations can accomplish using a meta data repository. However, projects that are or will soon be underway can create a solid meta data foundation for a corporation. Moreover, a company must act on information culled by meta data before any increased revenues or profits can result from a repository project. Before a company can start counting its profits, there are two paradigm shifts that must occur. And, of course, the corporate IT staff must be willing to change the way they build their systems for technical meta data to achieve a positive ROI.