In-Depth

Self-healing systems

What if software developers did not have to worry about the worst-case scenario when developing applications?

Today, said Rod Murchison, chief technology officer at Redwood City, Calif.-based Ingrian Networks Inc., “you have to do programming for all the worse-case boundary conditions on an application-by-application basis. If you can take a step back, say if something overloads and the infrastructure heals itself, it could be easier for you as a programmer if don’t have to worry about all these worst cases.”

Murchison acknowledges, however, that this concept “requires a leap of faith.”

It is a leap of faith that the leading platform players and systems management vendors are preparing their customers for, along with a model of computing that delivers IT resources on an as-needed basis. The payback of an IT infrastructure that is more self-managing and self-provisioning promises to be twofold: information technology will be able to align strategically with business priorities and processes, and IT operations will be more streamlined and, therefore, more cost-efficient and agile.

Automation is not a new concept -- tools and technologies have been becoming more “autonomic” with each generation. And the idea of delivering IT resources on an as-needed basis to respond to the peaks and valleys of demand -- prioritized by business need -- has been bandied about for years. Recently, though, the leading platform vendors have all rolled out plans for the data center of the future and are starting to deliver the technology that will make this vision possible.

IBM has rolled out its “Autonomic Blueprint” and “on-demand” initiative. Microsoft announced its Dynamic Systems Initiative (DSI). Sun Microsystems has a detailed plan for its N1 technologies and utility computing. And Hewlett-Packard (HP) has its Adaptive Enterprise strategy. The plans encompass both hardware and software, outsourcing and in-sourcing. Large systems management vendors like Candle, Computer Associates and others, are laying out strategies for how they will support these plans.

“The perception is that we’re spending too much on IT to run the business,” said John Kogel, vice president, emerging growth opportunities at Candle Corp., El Segundo, Calif. “IBM and HP are driving this adaptive computing/utility computing notion. Our customers are all worried about being outsourced, yet a lot of the customers we have are overworked, and they could be doing things more important to the business. It’s hard to tell if autonomics/utility management will alleviate pain or lose jobs. The value IT can provide is understanding the business and making it more competitive, which is easier said than done.”


‘Organic IT’

In what Forrester Research has labeled “organic IT,” data centers will see virtualized resources, rapid change (resources applied when they are needed), and larger pools of resources (a recentralization of the islands of IT resources).

“Pieces of this have been happening all along, so from the technology side it’s evolutionary,” said Ingrian Networks’ Murchison. “But from the realization that this is more about business processes, that is revolutionary. We haven’t done this in the IT world. Now we’re looking at the deeper layers of autonomics -- knowing who the users are, so we can see that ‘Bob’ is our best banking customer, so no matter what, his transactions get through.”

“The concept of an on-demand business is where various business processes are very well orchestrated end to end,” explained Miles Barel, IBM program director for Autonomic Computing in Hawthorne, N.Y. “The role of autonomics is to provide management of the infrastructure that supports business processes as defined by business policies, not the individual IT constructs.” In an autonomic environment, according to IBM, system components are self-configuring, self-healing, self-optimizing and self-protecting.

IBM’s Autonomic blueprint details an architecture infrastructure that embraces Web services and a variety of open standards, including the Open Grid Services Architecture (OGSA) and The Open Group’s Application Resource Measurement (ARM). The crux of the architecture is centered on intelligent control loops that collect information, make decisions and then make adjustments in the system.

“The hardware, operating systems and tools -- even at the sub-component level -- all need to be instrumented consistently, so an independent component can manage its own behavior as well as be managed by a higher-level construct,” explained Barel.

“We can have control loops that deal with the behavior of the server or the management of server farms. At the highest level, we can have a control loop for the bus process. It’s a very holistic approach to development,” he noted. “A standardized control loop can negotiate services, aggregate and work across infrastructure. The other element to the control loop is not just to monitor, analyze and execute, but at the heart is knowledge and the ability to learn from experience, so the system gets better and smarter through time.”

In addition to the blueprint, IBM has made some new tools available on its Alphaworks site to help users develop autonomic systems that will be compliant with the blueprint. These include the Log & Trace Tool, which puts log data from different system components into a common format. The Agent Building and Learning Environment (Able) is a rules engine for complex analysis. Business Workload Management utilizes the ARM standard to help identify the cause of bottlenecks and adjust resources as needed to meet performance objectives. And the Tivoli Autonomic Monitoring Engine has embedded self-healing technology. In addition, in May, IBM acquired Think Dynamics, maker of an orchestrated provisioning solution that will be incorporated into the blueprint.

Ingrian Networks’ Murchison said his organization has downloaded and worked with the Log & Trace Tool and the Business Workload Management tool. “We look at the open standards IBM has blessed. We’re a Linux shop, and it seamlessly works directly with our product even though we’re not a true Blue shop. We haven’t seen that same thing for the other vendors.

“The architectural blueprint guidelines give a good high-level overview of how the various components of the architecture [managed elements, device groupings and business policy groupings] are arranged, and how control structures/management interfaces are implemented,” continued Murchison. “I don’t find this architecture to be controversial at all and, from a development standpoint, I think IT solutions can be easily designed to fit into this architectural model. Many are designed this way today simply for product/solution robustness, but the added detail work will be on the control loop structures and particularly the ‘effectors’ that need to implement changes to the component behaviors.”

IBM’s Barel said that the biggest difference between IBM and its competitors is the “breadth of how we’re approaching it. The value of autonomics is greatest when you can provide autonomic behavior across the entire infrastructure. With Microsoft, it’s about a new way to build apps from [the] start to instrument them for better self-management behavior. With IBM, it’s about an evolution to infrastructure from what companies have in place. We let people evolve the infrastructure and introduce autonomics one at a time.”


Providing a common contract

For its part, Microsoft has also laid out a blueprint. Microsoft’s Dynamic Systems Initiative (DSI) is said to unify hardware, software and service vendors around a software architecture that centers on the System Definition Model (SDM). The SDM is said to provide a common contract between development, deployment and operations across the IT life cycle. SDM is a live XML-based blueprint that captures and unifies the operational requirements of applications with data center policies.

Windows Server 2003 is the first step toward delivering on this initiative. Future releases of Visual Studio, Microsoft server applications and management solutions will also support SDM.

Windows Server 2003 capabilities include the following:

* Automated Deployment Services (ADS) -- a provisioning and administration tool;

* Windows System Resource Manager -- dynamic systems resource management;

* Volume Shadow Copy Services and Virtual Disk Service -- storage virtualization;

* Network Load Balancing -- dynamic load-balancing for incoming traffic;

* Windows Server Clustering -- high-availability and scalable services; and

* Virtual Server -- machine technology for consolidation and migration.

One area where Microsoft and IBM differ in their approach is what Microsoft calls “the root of the problem.” Said a Microsoft spokesperson: “Fundamentally, these systems cannot become more intelligent until both the applications and the OS are developed with operations in mind. Management has been, and IBM continues to push it as, an afterthought to application and system design. It has to be baked in from the inception of an application.”

Microsoft has been working closely with Hewlett-Packard on DSI, and in May debuted a joint development effort: the Dynamic Data Center (DDC). The DDC features a combination of HP servers, software, storage and networking hardware that are connected based on a prescribed network architecture. Microsoft software dynamically assigns, provisions and centrally manages the DDC resources.

“It is fair to categorize the work HP has done with Microsoft around the DDC as investments that intersect with and enable them to realize their Adaptive Enterprise strategy,” commented the Microsoft spokesperson. “As HP’s Adaptive Enterprise strategy matures, we will be able to talk more about how our joint collaboration relates to this strategy.”

Launched in May, HP’s Adaptive Enterprise strategy also focuses on more closely linking business and IT. As part of the initiative, HP announced new Adaptive Enterprise services, including a set of business agility metrics, and new methodologies for designing and deploying application and network architectures to support constantly changing business needs.

Also announced was software for virtualizing server environments and new self-healing solutions for HP OpenView. Hewlett-Packard’s Darwin Reference Architecture is a framework for creating a business process-focused IT that dynamically changes with business needs, and has upgraded HP ProLiant blade servers.

Microsoft has also been working with other vendors, including IBM. IBM has been doing work with its xSeries around Microsoft’s ADS tool and the SDM.


Utility computing

Sun Microsystems Inc., Santa Clara, Calif., has also laid out its plans for both a more dynamic data center and utility computing. Sun’s N1 architecture comprises foundation resources, virtualization, provisioning, and policy and automation. Foundation resources are the various IT components already in place. Virtualization allows for the pooling of those resources. Provisioning maps business services onto the pooled resources, and policy and automation enable a customer to create rule-defining performance objectives for a given service. Based on set policies, N1 will manage the environment, adding and removing resources as needed to maintain service-level objectives.

According to Bill Mooz, director of Sun Microsystems’ Utility Computing, N1 will be leveraged in the firm’s utility computing strategy. “Today, we’re seeing that the infrastructure is where most of the action is, around pay-per-use models for acquiring hardware, software and storage. Sun is fully committed to this model.”

Soon, he noted, when everybody has the same financing terms available, “customers will look for an infrastructure that lends itself well to operating in a pooled environment vs. a dedicated one. Having a pre-integrated set of equipment, from the low end on the edge to large vertically scaling servers, and running a single robust OS is very important.”

At the management platform layer, “we’ll be leveraging N1 heavily. It will provide virtualization, provisioning, policy-based management, metering, billing, access to remote services and alert you when it’s at capacity.”


System management players

Whether you call it “on demand,” “utility,” “autonomic,” “dynamic” or “organic,” the “underlying business drivers are real, and they’re driving a new way of thinking about IT,” said Larry Shoup, technology strategist at Computer Associates (CA) International Inc., Islandia, N.Y. Systems management players like CA and others are now positioning themselves to play in these new data centers.

“If you take a broad view, it’s about how you deliver IT as service, delivering capacity as the business needs it and also hiding all that complexity from the user,” said Computer Associates’ David Hochhauser, VP, Unicenter brand management. CA’s strategy, he said, is based on three principles: a self-managing infrastructure, delivering IT as a service and building a Service-Oriented Architecture (SOA).

As such, CA recently released six new enhancements and products for its Unicenter line. The Dynamic Reconfiguration Option, for example, addresses provisioning. “Say you take a Sun box with Blade servers,” said Hochhauser. “We integrate with the Sun box and set the threshold. If you have a surge in demand, say a Superbowl ad campaign, and capacity starts exceeding 70%, we’d detect that and automatically reconfigure to add an extra Blade.”

Also new is Unicenter Software Delivery 4.0, which delivers built-in self-healing and provisioning capabilities for applications and operating systems to accommodate changes in business demand. It also manages software interdependencies to ensure that the on-demand infrastructure functions properly.

CA has also partnered with Microsoft on its DSI efforts, and plans to work closely with all the platform players. “We’re Switzerland and will interface with everyone; they’re all building native capabilities into their boxes and operating environments. Our perspective is from the management level; it’s the thing that will make this valuable,” noted Hochhauser.

Candle, for its part, has worked with IBM research and on the autonomic committee. “I think you will hear less [about] autonomics and more about e-business on demand and utility computing,” said the firm’s Kogel. “You need autonomics for a utility management infrastructure. The blueprint is somewhat vague, but we will get a better understanding of how Candle can play. We were told that by August it would be better ironed out for third-party ISVs. We understand the concepts and that things like our pricing model have to change, and we understand that we need to be more dynamic.”

He added: “Candle’s expertise is around real-time monitoring and management. As things become real-time, we’re in a good position to provide the information that’s needed. We’re building APIs that IBM and HP can use to look at the information that’s important for workload analysis.”


Role of the developer

What will a more autonomic and dynamic data center mean for developers? According to a Microsoft spokesperson, “fundamentally, the work we are doing with the System Definition Model and development tool vendors will strengthen the relationship between the IT operator and application developer, giving them both a more powerful set of tools to much more efficiently develop, deploy and operate applications for a dynamic system. New SDM-compliant tools will enable IT operators to provide developers with operational requirements that will make it easier for developers to write applications that are very easy to deploy and automate in the data center.”

Added IBM’s Barel, “When you look at the toolkits and more technology that will roll out, it should mean productivity gains in the application development process. More important, it should mean that the applications as deployed by their customers will perform more rapidly and consistently, and the quality of service will be greater. As systems become self-managing and have better problem determination, it helps to relieve the burden on the development support organization for figuring out what’s going on in the customer environment. Today you may spend time chasing an applications issue and, really, the issue is in operations.”

“Developers need to look at how they can build software programs such that they are utility optimized,” said Sun’s Mooz. “If we get to the point where we’re all going -- where you have libraries of software stacks from the OS to the end applications that quickly get pulled down and provisioned -- developers will want to look at how they can build a product so that it integrates quickly and stays integrated. That’s why having a common OS to run this environment will be significant.”

Added CA’s Hochhauser, “In one sense it might make some of this easier; as hardware and resources become virtualized and buffered from the applications themselves, developers can code with less regard to what the resources are. If you’re building an application that needs three servers instead of two at any point, and the concept of virtualizing and provisioning separates the application from the machine, it should buffer the developer to get resources on demand as business needs it, whether from IT or a third party.”


The IT operations role

In addition to developers, IT operations staff face changes in their role with a more autonomic environment that delivers capacity as needed. Sun’s Mooz said his company is moving aggressively to deploy the utility computing model within Sun itself. “It changes the skill profile the people [in IT operations] will have. To the extent we can free people up here, they’re immediately gobbled up and put on other projects. It comes back to the importance of not looking at utility computing in isolation. If you can effectively consolidate, put the right processes around it, the right architecture in place, and use some powerful tools of utility computing, then you can free up resources to work on other projects.”

While some industry observers have said that the thrust of IBM’s strategy, for example, is really about outsourcing, IBM’s Barel disagrees.

“It’s about business transformation and how you acquire IT to allow organizations to respond to current need. It includes utility computing for one mechanism, but it’s not the only answer and it’s not a simple choice. Some may choose to outsource all IT; other organizations may want to look at an on-demand operation environment, implement the grid, etc., but within their own infrastructure managed by their own people. A third scenario is that you may outsource some functions and use a utility model for peak demand.

“Some people want to say the data center of the future has two employees: one person and one dog,” added Barel. “The dog’s purpose is to keep the person from touching anything. What we hear from customers is that they’re spending a large part of the budget keeping what they have in place running, and they don’t have a lot of money for new projects. We would like to allow people to focus on using IT to enable new business opportunities, not a loss of jobs.”


For more, please see the related story “The contrarian view” by Colleen Frye, or click here for Web-only stories on grid computing.