In-Depth

Getting real about real-time BI

Allied Office Products, which claims to be the largest independent dealer of office goods and services in the United States, is increasing sales and improving customer service in part by dipping into the company's store of customer knowledge -- its data warehouse. In fact, the Clifton, N.J.-based company is a pioneer in implementing what is becoming a major trend in the world of business intelligence: the use of real-time data.

To enable these new capabilities, Allied has adopted Ascential Enterprise Integration Suite and Ascential Real-Time Integration (RTI) Services from Westborough, Mass.-based Ascential Software Corp. to provide the underlying infrastructure for its OneSolution business management system. OneSolution links customer service, supply-chain and order-processing solutions in real time to maximize cost savings, efficiency and customer satisfaction.

Allied uses Ascential to identify and eliminate duplicate records, incomplete fields and other anomalies in its system. Employees get on-the-fly error reports that identify redundant entries and notify sales and service representatives of missing data. This master data management approach yields consistent, reliable information and deeper insights into the firm's customer base and sales trends.

Allied officials said the tools allow the operation to offer real-time customer service, order-processing and supply-chain management. For example, Allied is exploiting the Web services capabilities of RTI Services to let customers rapidly process returns online, officials say. The solution plays a pivotal role in deploying and managing these services by creating a link back to the company's enterprise systems and integrating the data, said Ken DesRochers, senior vice president of information technology at Allied Office Products.

"Ascential Software helped us gain the elusive 360-degree view of our customer that many companies strive for but too few achieve," said DesRochers.

DesRochers explained that the challenge in achieving one cohesive view of the customer was that data was dispersed among three disparate systems: Oracle 9i running on a Solaris SPARC system and supporting Allied's e-commerce Web site; Salesforce.com, the company's Web-based customer relationship management (CRM) system; and, for financials, a custom-built legacy Enterprise Resource Planning (ERP) system running on an IBM AIX RS/6000 system with a UniData database.

Today, more organizations are striving to achieve what Allied has aimed for: real-time leverage over not only historical data but also over moment-to-moment operational data. But all of them face challenges in implementing this speed-up. What is more, experts say for many enterprises, real time may not yield all the benefits that vendors promise.

What it takes
Getting to real time is not always easy. But there are plenty of familiar techniques that can support this new style of business. One is to simply make ETL processes run more frequently so that you can do continuous batch loads -- every hour or even every minute. Companies may also decide to change the data capture process to focus on what has changed and then move that to the warehouse. A replication system, for example, would permit exactly that, said TDWI's Wayne Eckerson, and Change Data Capture (CDC) tools -- based on replication and middleware -- can be useful.

Then there is publish and subscribe. Roy Schulte, research team leader at Stamford, Conn.-based analyst firm Gartner Inc., agrees that real time -- however it is defined -- is not entirely new. In some cases, he noted, techniques have been honed in the financial industry since the late 1980s, particularly publish and subscribe ("pubsub"). Used on Wall Street and in certain factory floor automation scenarios, the up-to-the-second data delivered through pubsub has been employed to achieve critical advantages.

But until recently, said Schulte, most messaging was based on point-to-point, which made pubsub harder to implement. "The nice thing about pubsub is that it is one-to-many, which is better for many processes," he said. But now, pubsub is a well-established option. Specifically, Schulte said, IBM enhanced its MQSeries to support pubsub and Microsoft has announced that future versions of its messaging software will also do pubsub. "Ever since Java Message Service [JMS] came out a few years ago, most new messaging products have offered the pubsub option," he said. However, the idea of using pubsub-type techniques for widespread business intelligence needs is new.

"People have wanted to do real time for years with BI systems," said Schulte, but in many cases the technology was not ready. Now, things are different. Thus, extending BI to the here and now is catching on.

As with prior waves of technology, Schulte said companies are already starting to come up the learning curve -- he cited the success of a Business Activity Monitoring (BAM) dashboard at Wal-Mart as an example. But BAM is not without risk, he warned. All that information may overwhelm the mere mortals tasked with digesting it. "If a car goes 150 miles per hour you can get in more trouble [with it] than you can with a car that has a top speed of 20," he said.

Initially, a critical task for developers will be reducing the information glut. "You don't want to tell everyone everything -- you want to filter the notification so you just tell them about the important exception conditions," he said.

Schulte said some of the best thinking in the area has come from Professor David Luckham at Stanford University who has proposed the need to aggregate "multiple simple events into complex events that provide a higher-level finding usable by business people." Schulte also recommends Luckham's book, "The Power of Events: An introduction to complex event processing in distributed enterprise systems" (Boston: Addison-Wesley, 2002).

Examples of such aggregations in practice include automatically tallying traffic being handled by a call center to understand whether there is a need to call an additional shift in early, Schulte said.

What does all this mean for the supporting infrastructure? Schulte said pubsub is simply a communication pattern -- a relationship between multiple parties. For BAM, its flexibility is extremely useful "because you can dynamically choose at any time to start subscribing to different complex events and from that moment on you will be notified when those conditions have occurred," he said. Pubsub is also used "in the background" in the collection of data, he noted.

"I don't think existing applications will be ripped out, but many new applications will use publish and subscribe where they might have used something else before," Schulte said. "It certainly won't pay to rip out the current systems that are built for one-to-one messaging," he said. But for many new applications pubsub will be a frequently used option. Normally, application databases are just updated through ODBC or SQL type interfaces, but if the operational data store tries to collect information from multiple geographic locations and multiple applications, "and if all those updates are supposed to line up in one central store, the most flexible method is publish and subscribe," he added.

Another data delivery style, trickle feed, may also be important to real time, Schulte said. Although pubsub is, arguably, a form of trickle feed, trickle feed is usually envisioned more in terms of mini-batch jobs, run every few minutes, he said. And, implicitly, that method can be built simply enough on existing processes.

Other analysts look at the real-time challenge differently.

Mike Gilpin, an analyst at Forrester Research, Cambridge, Mass., suggests that real-time and traditional data analysis are two different creatures. One is not a replacement for the other. And, therefore, one need not think in terms of replacing existing architectures. Instead, the Forrester view emphasizes that conventional usage demands conventional data warehousing, analytics and business intelligence technology. Conversely, real-time usage demands an event-driven message-based approach. That usually means Web services, but Forrester and Gilpin acknowledge that traditional higher-cost proprietary messaging products may have the advantage of already being deployed in your environment.

Ted Friedman, principal analyst at Gartner, said he has also seen companies achieving some success with real time by building functionality around the operational data store (ODS) model. That provides a way to hold the information for a relatively short period and then make it available to operational dashboards and the like before putting it into a data warehouse for longer-term storage or analysis.

Indeed, Alex Chelminsky at Systems Engineering Inc. (SEI), Waltham, Mass., recently helped a customer do just that. A current SEI project at a financial institution addresses the need for timely information about customer demographics, especially contact information such as mailing addresses, phone numbers and e-mail addresses, which can change frequently. SEI used IBM MQ, an ODS and a Customer Centric Data Repository to create an architecture that enables customer representatives to receive updates coming from the ODS, and then verify and effect the changes while they are in contact with the customer. The architecture combines the use of near-real-time updates, batch processing of third-party demographics, and dynamic business rules to provide a more accurate view of customers and their relationships to the organization, Chelminsky said. The result is the ability to perform more precise behavioral analysis and provide more targeted services to customers.

"With the inclusion of e-business and CRM initiatives in the fabric of business operations, the need for providing near real-time updates to BI environments is becoming more relevant," Chelminsky said. At the same time, the existence of universal messaging software like IBM MQSeries, Neon and Tibco has made instant updates closer to reality.

Tibco -- a company with roots that go back to some of the original real-time apps of the mid-1980s -- offers a product platform called ActiveEnterprise. Ed Zou, Tibco's general manager for business integration, said the company offers a range of messaging solutions. In addition, Tibco offers database adapters that turn standard databases into event-driven publish-and-subscribe engines.

"Fundamentally, most database products deal with data at rest," said Zou. "What is missing is the ability to capture data in motion," he added. A major challenge, he said, is that in a typical data warehouse environment, once the data is inside "you almost have to ask the right question to get it out." By contrast, when an event-driven approach is adopted, as soon as the data changes it is published, which can trigger other events within the IT infrastructure, explained Zou.

Sanju Bansal, chief operating officer at MicroStrategy Inc., a McLean, Va.-based BI vendor, said he sees a number of styles of real-time implementation emerging. One of the architectural trends he noted is that real-time data is generally kept in a simple relational database. By contrast BI/DW information is usually built into a "cube" for full-scale analysis -- a process that gobbles up memory and can take hours. The implication, Bansal said, is that data gathered for real-time purposes may not yield information in the same way as traditional BI data. Bansal said he is also starting to see portals used more widely to integrate real-time data with historical information. As an example, he cited yahoo.com/finance, which provides real-time market data as well as historical information.

Bansal said many of his company's customers are speeding up their systems, using 15-minute trickle feeds rather than nightly batch jobs to keep systems up to date.

Real perspectives
While there appears to be consensus that BI can be done, questions remain about its value and whether most enterprises should even consider it.

For his part, Sid Adelman, an analyst and consultant at Sid Adelman & Associates, takes a rather dim view of the real-time hoopla. "This is going to be a limited market with limited applications," he said flatly. "I think there needs to be some very clear understanding of what customers mean by real time. Very few actually need it and the number of real-time BI/DW installations is limited."

Furthermore, he noted, with real time, recovery becomes more complicated. "It is a real challenge; performance becomes difficult, and you need some red-hot practitioners to keep from choking," he said.

More anti-real-time rhetoric comes from Gartner's Friedman. He declares that the move to real time is mostly hype. "We did a large survey in 2002 of 500 global companies and found only 10% to 11% were implementing real time," he said. Looking toward 2006, Gartner still sees this as something that fewer than half of companies will be doing. "For most organizations, real time isn't necessary or even desirable."

If companies do more with real-time capability, they still need to educate users to deal with more volatile data, said TDWI's Eckerson. "Not every user wants to use data that changes all the time," he said. The key is to deliver the right data at the right time to the right users - whether it is pushed to the users or pulled by the users.

For most people, updating a data warehouse within 24 hours constitutes real time, he said. That could be a few seconds or once a day. "That kind of interday processing is growing at up to 100% a year -- and we are only at 10% or less now," noted Eckerson. But an indicator of potential growth is that almost two-thirds of companies are now doing processing on a daily basis.

Please see the following related stories:
"New wrinkles in data integration business" by Alan R. Earls
"DB design and app development: Why can't we be friends?" by Alan R. Earls
"Sorting out the meaning of BPM" by Wayne Eckerson