In-Depth

New routes to XML data integration

As budgets are tightened and staff downsized, IT departments have to find new ways of leveraging XML's tagging schema to access data from disparate sources. Screen scraping, the traditional method, has provided unreliable results, and money and manpower constraints make it difficult to use the combination of XML databases and heavy-duty coding some of the larger software corporations require.

New methods of leveraging XML's ability are emerging. One involves using an event model, where an event is an intersection of time, topic and location, to describe situations. Another involves new ways of extending the legacy green screen by intercepting the data stream. Yet another approach is to map meta data from disparate databases into XML to create a hierarchical view of data, and to then use advanced distributed query processing technology and software adapters to query data sources in their native format. A fourth approach is aimed at rich data environments; it lets users deal with both XML data and XML documents, and will soon include audio and video capabilities.

The event model
PRAJA Inc., San Diego, sees business or IT situations in terms of an event model where an event is "an intersection of time, topic and location," said CEO Mark Hellinger. Events can have sub-events, as well as components that consist of objects and data, and are a higher level of abstraction than objects for representing data and interrelationships between various types of data or abstractions of that data, he said.

The event paradigm allows users to get more information out of the data. "When you are working with a sales report, for example, it's not enough to just see sales figures, you also need to look at marketing campaigns and inventory and those aren't necessarily in PeopleSoft or SAP—they could be in custom databases, brochures or TV ads," Hellinger said. An event would include all of that other information.

PRAJA's ExperienceWare platform generates XML schema to map data from external sources into its event model. Developers first identify the key management and measurement components that can be represented as part of an event. They then use ExperienceWare's XML schema to extract and transform data from disparate sources and load that data dynamically in real time as part of the ExperienceWare engine.

Users navigate through the data with the Java-based ExperienceWare Unified Viewer visualization application. They can click on a topic or multiple topics; click on a geography, or zoom in and out of a location; and identify a time and see the result in real time. If a user changes a variable, such as a geography or location in PRAJA's model, all of the related data is also changed. After an event is analyzed using ExperienceWare, a business rule, notification or trigger is generated as a result.

Ramasamy Uthurusamy ("call me Samy"), general director at General Motors' (GM) Information Systems and Services organization in Detroit, is using ExperienceWare to help manage reports on auto sales figures. Collating this data for a multinational corporation like GM, which has "different data in different formats in different parts of the world," is so complex that it is done by three different groups that "sometimes generate hardcopy reports that trickle along to people," Samy said. Traditional solutions for automating the process would not do. "I didn't want people to go to each database and look at it on the screen. I wanted them to be able to choose a particular time and automatically see that reflected in data in the other dimensions—spatial and organizational," Samy said.

GM worked with PRAJA to come up with a proof of concept, and funded PRAJA's customization of ExperienceWare to let GM look at a subset of sales data. GM restricted the view to a subset because "we were experimenting with the product and didn't want to go into real time immediately," Samy said. The project has not yet gone into production, but Samy is already seeing results.

"It's much easier to do sales data analysis using PRAJA's tools," he said. "The analysts only need a browser. They launch the event viewer or event finder, and from their screen they can look at multiple data sources in a unified way and do what-if analyses. I was looking for one thing that would do all of this and I've found it."

Extending legacy green screens
While many developers use screen scraping to extend legacy green screens to Web or data services, the results are not always reliable or robust. Screen scraping generates XML that "won't scale, is too easy to break and is too difficult to implement," said Russ Teubner, founder and CEO at HostBridge Technology Inc., Stillwater, Okla.

Instead of screen scraping, HostBridge XML-enables existing 3270 applications by using a CICS feature—the IBM 3270 bridge interface—to intercept flow control between an application and Basic Mapping Support (BMS), a component of CICS. The interception occurs before BMS is executed. HostBridge takes the field name value pairs the application is trying to communicate and dynamically generates an XML document containing those elements instead of returning a 3270 screen. BMS separates the presentation and business logic, and takes on the responsibility for presenting data to the terminal, Teubner said.

One HostBridge customer is a large bank in Italy that runs customer interactions through a WebSphere application and wants to be able to view its CICS transactions as sources of data to WebSphere, Teubner said. Whenever data residing on the mainframe is required, users send a request to HostBridge that executes the necessary transaction and returns the data as an XML document that "is easily integrated with WebSphere," Teubner said. HostBridge scales well because "we don't have software running on the middle tier; our software runs as a thin layer on the mainframe," Teubner said. That also makes for a reliable solution.

When you have software running in the middle tier, "it's not easy and performance and reliability can become issues," noted analyst Dale Vecchio at Gartner Inc., Stamford, Conn.

Jacada Ltd., Atlanta, takes a different approach to extending the green screen. Its Jacada Interface Server software intercepts the data stream at any point along the way once the stream has been created and the green screen formed as a 3270, 5250, or VT 100 or 220 screen, and turns that into a transaction. "This is not your grandfather's screen scraping; it's what we call a screen-based solution," said Rob Morris, Jacada's senior director of product marketing.

The Jacada Interface Server comes with a Windows-based IDE called the Interface Development Kit. For deployment, Jacada offers a Java-based middle-tier app server with several APIs, one of which is XML over HTTP. Users can pass an XML document to the server, have it execute a transaction and return the results of that transaction in an XML document. To turn a set of green screens into a transaction, Jacada Server does what a user sitting at a green screen would: It navigates to the appropriate screen and queries, returning a component or transaction with the required data.

Jacada user Lillian Vernon Corp., Rye, N.Y., a direct-mail specialty catalog and online company, has more than 24 million customers. Ellis Admire, its director of MIS operations, used Jacada's XML API to automate dealings with a third-party returns facilitator.

Lillian Vernon customers who wanted to return goods used to have to call for return authorization, then mail the goods back at a post office. Automating the process allows customers to drop off return packages at local mail facilities such as Mailboxes Etc. and get on-the-spot authorization for the return when the mail facilities staff keys in a request at a terminal. The request goes to the returns facilitator's computer, which has links to up to 5,000 mail facilities. That computer then sends a return authorization request to a Jacada Server sitting as middleware between the facilitator's computer system and Lillian Vernon's back-end mainframe. The server generates an XML request for a return authorization, then searches through Lillian Vernon's legacy-based screens for the appropriate response.

"This keeps the business logic on our back end consistent; only the information that has been massaged is fed back to the facilitator's computer system which, in turn, feeds it back to the individual mail facility location," said Admire. At that point, the mail facility's terminal spews out a printout stating what will be done with the return package and whether the customer will get a refund. The mail facility consolidates returns as a group and sends them back to the third-party returns facilitator in a batch to help streamline costs.

Meta data mapping and distributed query processing
Nimble Technology Inc., Seattle, offers the Nimble Integration Suite, a software platform that uses XML and distributed query processing technology to let Web services and applications access and query data stored in multiple databases and return a unified result. The data being accessed can be within an enterprise or outside its firewall.

First, the Nimble Integration Suite maps meta data from disparate databases into XML. That meta data sits in a hierarchical representation in Nimble's Metadata Server repository. Users can employ drag and drop to run database operations against that native meta data. "You can do a join between, say, an IBM data source and an Oracle data source so you end up with a view of your data irrespective of the underlying data sources," said Michael Pinckney, Nimble's vice president of marketing. "Using XML lets us look at the data that's in an IBM data source in the same way we look at data in Microsoft Exchange, SQL Server, SAP or whatever."

Once that is done, the question of how to access the actual data arises. Nimble uses advanced distributed query processing technology that lets users query across multiple data sources simultaneously through the use of adapters. The distributed query processing technology lets Nimble figure out the algorithms to "ask the right questions to the right databases in the right order, and then puts them together in the right order so the end customer gets a unified view," Pinckney said. A unified view for a sales query, for example, would be "a view that has customer, machine sales, parts sales, service calls and so on—all the parts that would go into answering your query," Pinckney said.

Rich data environments
Quovadx Inc., Englewood, Colo., offers QDX Integration Tools, which let users deal with both XML data and documents simultaneously, said company CEO Lorine Sweeney. It processes XML data streams as messages from data transactions; and processes documents, or content, with a technology based on Extensible Topic Mapping (XTM), an ISO 13250 standard for processing XML-based content. "When you integrate applications, particularly on the Web today, it's a combination of text and data," Sweeney said. Quovadx plans to extend its tool suite into video, audio and legacy data that will be transformed into XML streams and re-purposed into new applications.

Quovadx's origin was in the health-care industry where there "are many different systems"; QDX Integration Tools, which has been around for 11 years, has "between 3,000 and 4,000 legacy formats," Sweeney said. Quovadx's technology lets users maintain their own business rules.

Quovadx customer John Lopez is director of business transformation and innovation services at the Federal Services Division of national managed health-care company Health Net Inc., Woodland Hills, Calif. He is using Quovadx to build a medical management system as a pilot that will be rolled out enterprise-wide if it succeeds. The Federal Services Division has 500 desktops, and Lopez selected Quovadx from among 18 vendors because "Quovadx could meet our business requirements, had a native XML Web-based product that let clients maintain business rules, and had a very light browser-based footprint on the desktop that cuts the cost of managing and updating desktops," he said. "It also has a very strong workflow component, which is very important for medical management systems."

Whichever approach developers select for XML data integration, the most important thing to remember is that business needs must drive technology. Selecting technology or a method for its own sake is futile.

Integration issues
Gartner Inc. analyst Dale Vecchio said corporations should consider these issues when planning to use XML to access legacy data:

  • It is about the people, not the technology. Is there a culture of change and are people receptive to change?
  • Decide whether you want to use XML as a message description mechanism or a data description mechanism.
  • Figure out what data means. "XML is an enabling technology that allows you to describe content, but you still need to agree on meaning," Vecchio said. Does everyone in the organization agree on what a customer is, for example?
  • Figure out whether your approach is inside to outside, or outside to inside. If you are looking to extend legacy systems to new customers, that is inside to outside. If you are building a brand-new application and are looking for a way to get at legacy data as a source of new information for that application, that is outside to in. Going out from the inside requires that you have ways to get from COBOL copybook files to XML messages. Going from inside to outside may require dynamically generated XML parsers that are message-specific. Or you could use ODBC or JDBC in a Java or Microsoft environment; or you may need to access non-relational databases with existing transactions on the mainframe or use data extension products such as Merant's SegueLink, Vecchio said. If you are going from outside to in, you can take an adapter or connector strategy to access mainframe data and format it in XML, he added.
  • Finally, figure out how to solve ancillary issues such as data integrity and how to roll back data when there is a failure, for example. "XML as a descriptive mechanism is a great thing, but it doesn't solve any of the transactional issues on either side of it, such as data integrity, logical units or being able to roll back data when failures occur," Vecchio said. "XML just describes data flow."

—Richard Adhikari