In-Depth

Designing a scalable dot.com architecture using J2EE

What does it take for a dot.com to survive its initial stages? Is there a way to develop a dot.com application architecture, keeping in mind the unpredictable nature of market share and customer growth? A component-based, scalable dot.com architecture can be designed with the technologies that are currently available, including Java 2 Enterprise Edition (J2EE). A strong business model and a strong technical architecture are imperitive for the growth and success of any e-business. The fundamental principles of architecture design can be used by a start-up dot.com as well as an established one. The growth stages of a dot.com business will be examined in this article, followed by an explanation of the elements necessary to grow the e-business. Design principles are recommended that allow code reuse at later stages.

A dot.com architecture is a framework that delivers services to end users over the Internet in a highly secure and reliable manner on a 24x7 availability basis. All the business processes conducted through the Internet, extranet or intranet are part of the dot.com. For a bricks-and-clicks company, the business processes, inbound logistics such as material purchases from suppliers, outbound logistics such as distribution of products to the warehouses, retail stores, customers, product promotion, Customer Relationship Management (CRM) and employee interactions may be incorporated into the dot.com process.

During its initial stages, the dot.com requires minimal infrastructure to support it. As it grows and becomes more popular, the infrastructure needs to be scaled. The application architecture is usually patched many times to accommodate this growth. Often, the patched application either fails to perform adequately or it degrades. After several patches, the application architecture comes to a point where no more modifications can be implemented; this is the point at which the cost of making modifications outweighs the benefits. A total revamp of the application architecture becomes necessary, as does discarding the applications developed up to that point. Only when the architecture is designed from its inception with future growth in mind will the applications developed during the initial stages of the dot.com be reusable and scalable at later stages.

Architecture principles
The architecture should be based on open standards. Native technologies may be suitable in the beginning, but they make the architecture difficult to extend and integrate with new technologies later in the process. Following open standards allows developers to take the best-of-breed approach. Examples of open standards are HTML and XML for content in client browsers, Java for business logic in enterprise servers and Linux for operating systems. As the industry moves more toward open standard technologies, more vendors support them.

Architectures may also be component-based. A software component is a module designed for a specific function that can be used with other software components to build an application. When the architecture is component-based, it can be extended easily. A component may also be replaced with a more appropriate component in the course of a company's growth and as technology changes. For example, a billing component will have all the functions needed for a billing process, which can be used in a shopping cart.

The architecture should be designed in a multi-tiered manner. Four-tier architectures consisting of client, server-side presentation, server-side business and enterprise resources are very common. Multi-tier architecture provides the simplicity of developing the system in one development machine and the flexibility to move the components to several distributed machines in production.

The architecture should be able to integrate with the existing systems such as legacy systems, servers and Enterprise Resource Planning (ERP) systems. Established enterprises moving to e-business will have had production systems in place for several years. The process of bringing these existing systems into the new architecture involves a lot of money, time and effort. User acceptance and quality assurance provided by the existing systems is an asset. Rewriting these systems will be expensive and difficult, if not impossible. When the dot.com architecture integrates the existing systems, the enterprise instantly leverages its investment over a number of years. For many companies, security, transactions and business processes are already built into the existing systems. The addition of a communication interface and a Web-based GUI is often sufficient to integrate them into the dot.com architecture.

Every business strategy is formulated on the basis of reducing costs and differentiating the business from its competitors. To achieve those two goals, the dot.com should attract a large number of customers to its Web site. However, the increase in demand caused by growth in the customer base can degrade the architecture's performance. To provide good service, the architecture should grow at the same rate as its customer base—the infrastructure and all the systems designed and deployed need to be scalable.

Principles of dot.com architectures
  1. Architect on open standards
  2. Provide performance availability, maintainability and simplicity
  3. Scale as the business grows
  4. Adapt to technology changes
  5. Build on components
  6. Provide transaction mechanisms
  7. Integrate existing production systems
  8. Support connectivity to services using back-end interfaces
  9. Ensure flexibility to integrate new and existing technology systems

Source: Kamesh Namuduri and Palani Ram

Dot.com growth pattern
A typical organization can plan the growth of its dot.com architecture in three phases, as depicted in Table 1. When a business starts its Web presence, its goal in phase one is product promotion through catalogs and brochures. Slowly, the business moves on to business transactions, collecting customer information and promoting customer relationships. Time to market and pressure from both competing dot.com companies and its own investors may be the key business issues with which the company has to deal during phase one.

Table 1: Dot.com growth stages
Growth stages Issues Demands on the e-business
Phase I
  1. time to market
  2. competitive pressure
  1. successful business strategy
  2. open and scalable architecture
Phase II
  1. growth in customer base
  2. growth in market share
  1. customer focus
  2. additional infrastructure
Phase III
  1. steady growth
  2. building of partnerships and alliances
  3. streamlining of business processes
  4. transforming business data into information and knowledge
  1. solid technical support
  2. well-defined multi-tier architecture and easy access to online business
  3. integration of all business applications
  4. building of customer relationships
Source: Kamesh Namuduri and Palani Ram

Phase two of a dot.com involves dealing with the growth of the e-business from every perspective. Assuming that the business is successful in attracting its customers, it should gear towards a steep increase in customer base. The dot.com needs to retain its growing number of customers, which requires highly available, scalable, fault-resistent infrastructure and a rock-solid architecture that meets the demands of the growth.

Phase three of the dot.com involves steady growth and complete automation of the business processes. During this phase, the focus will be on integrating all the business applications and tying them together to present a unified Web interface to its customers, suppliers and business partners. For example, when a customer places an order, the system should be able to show the item's availability by contacting the warehouse, check the customer's credit by contacting his or her financial institution, place an order for the shipment, and send confirmation of order details to the customer, all in few clicks. While doing so, the e-business should satisfy the customer to ensure that they return. CRM modules need to record the experience for future marketing and advertising.

During all three phases, the technical architecture plays a significant role in the success of the e-business. As the customer base increases, the architecture should scale to meet the demands. To survive in a competitive market, the business needs to operate at tremendous speed and effectiveness, which is possible only when the architecture is flexible enough to accommodate the dynamics of the business, adaptable enough to interface with business partners and scalable enough to accommodate the growth.

During the initial stages, the infrastructure required for dot.com business is minimal. However, it is important to remember to embrace open standards and scalable architecture from the very beginning, especially if the dot.com is starting from scratch. It is also important that the infrastructure (networks, data storage and servers) delivers high levels of reliability, performance and availability.

The dot.com software architecture in its simplest form consists of three tiers: presentation, business and data (see Fig. 1). The infrastructure required to support this architecture consists of servers to host the Web components and the data tier. Network devices (routers and switches) connect servers within the enterprise.

Figure 1
Namuduri Figure 1
The dot.com software architecture consists of three tiers: the presentation tier, the business tier and the data tier. Servers that host the Web components make up the infrastructure to support this architecture.

As the e-business grows, both the business tier as well as the data tier may need to be scaled to accommodate the growth. The middle tier may be expanded to include several Web servers to serve a growing number of customers, as well as dedicated servers to support middleware business logic components. Adding more database servers to accommodate the growing business and customer data may expand the data tier. More infrastructure is required to support this architecture (see Fig. 2).

Figure 2
Namuduri Figure 2
As the dot.com company grows, more infrastructure is required to support the growing business and data tiers. The business tier may be expanded with several Web servers.

Design strategy
One approach to designing a dot.com architecture is the Model-View-Controller (MVC) architecture. This approach has many synonyms such as MVC framework and MVC pattern. We call it the MVC architecture. The MVC architecture provides many advantages and was developed over several years of implementing large projects. The benefits of using the MVC architecture are:

  • Clear separation of modules: This provides the ability to extend and replace modules with ease. When the business logic changes, the corresponding business component can be modified without affecting other parts of the systems. In the MVC architecture, changes can be confined to particular components or areas.
  • Flexibility of the views: For the same function, different views can be generated. For example, the same data can be implemented as an HTML page, applet, spreadsheet, text document or PDF document.
  • Ease of change: When technology changes, the model can still be used with the updated view components. For example, conversion from HTTP request to WAP request does not affect the underlying business model.

Practical design
In the simplest form, the dot.com architecture consists of a Web server serving static pages. Dynamic pages can be served through the use of Common Gateway Interface (CGI) programs. The CGI programs can be written in several languages including, but not limited to, C, C++ and Perl. The disadvantage of this architecture is that CGI programs span a process for each request. They are stateless and do not know what the user previously requested. To preserve the state, HTML pages must be created to dynamically embed the state information as hidden fields, but this makes the architecture non-scalable. Even though this architecture is easier to implement than a scalable one, and many dot.coms have used it and have evolved from it successfully in the past, we do not recommend it because it is becoming an obsolete technology.

In the servlet-based architecture, the servlet acts as a gateway between the outside world and the enterprise; it is an HTTP protocol-based request-response mechanism. It acts as the controller in the MVC architecture. The servlet-based architecture works as follows:

  1. The client makes an HTTP request to the servlet.
  2. The servlet processes the request.
  3. The servlet sends an HTTP response to the client's request.
  4. There are many different design approaches to implement the second and third steps. The alternatives depend on the growth stage of the dot.com, available resources, code reuse, business decisions regarding investments, enhancements and scalability features planned for later stages of the dot.com architecture.

    Processing the HTTP request involves reading the request for the necessary information (such as search criteria for a bookstore dot.com) and must be performed in the business layer of the architecture. The output of the processing is the content or the data for the response, such as the book's availability, author, publisher, price or discount. For more scalability the servlet is kept as light as possible. It holds the reference to persistence objects such as resource pools or a pool of database connections. All the processing is implemented in the business layer. The database access, if any, is also done from that layer. HTTP requests can be processed in two different ways, non-distributed and distributed.

    In the non-distributed architecture, the business layer classes are deployed in the same machine on which the Web server that hosts the servlet engine runs. In this architecture, the business processes are executed in a single Java Virtual Machine (JVM). Each request runs in different threads within the JVM (see Fig. 3). This non-distributed approach is simple to implement and is suitable for dot.coms in the initial stage. It assumes that there are no existing systems or that the architecture is designed from scratch. When designing systems using this architecture, the classes should be designed to be able to migrate to distributed and EJB architectures later.

    Figure 3
    Namuduri Figure 3
    In a servlet-based non-distributed architecture, business processes are executed in a single Java Virtual Machine and each request runs in different threads within it.

    The distributed approach involves distributing the business layer classes in different machines, each running a different JVM. Servlets access the business classes through Remote Method Invocation (RMI) (see Fig. 4). The distributed approach works equally well for a dot.com and an established enterprise, which has to use its existing processes and systems. The existing systems can be legacy systems, servers written in C++, enterprise resource packages and so on. Java Native Interface (JNI) can be used to wrap the native code. Table 2 compares the distributed and non-distributed approaches.

    Table 2: Approaches for designing a dot.com architecture
    Non-distributed approach Distributed approach
    Architecture in initial phases Architecture in growth stages
    Minimal infrastructure More infrastructure has been added
    Minimal development resources available More development resources available
    Existing production systems not integrated with dot.com architecture Existing production systems integrated with dot.com architecture
    Few visitors to Web site Many visitors to Web site
    Secured transaction processing not required Secured transaction processing required
    Source: Kamesh Namuduri and Palani Ram

    Figure 4
    Namuduri Figure 4
    In a servlet-based distributed architecture, the business processes run on different machines that each run a different Java Virtual Machine.

    The response or output from the business layer consists of the data or content, which is passed on to the client. The content may need to be converted to different formats depending on the client's and business' needs. The output may be formatted to an HTML, PDF, XML or a serialized object if the client is an applet. Although this formatting can be done within the servlet, it is not recommended because it violates the guidelines of the MVC architecture.

    The view layer can be implemented using Java classes such as HTML, PDF or XML builder classes that format the output. This architecture is more suitable for applications than for Web interfaces.

    The view or presentation layer can also be implemented using Java Server Pages (JSPs). The business layer can send its response to a servlet in the form of an HTTP Response. The servlet can forward the HTTP Request and the HTTP Response to a JSP for further processing. The JSP can access the business layer output using a JavaBean. The JSP has all the content to be formatted. It can use some helper classes to format the response, which allows the JSP to change instantly without affecting the business logic. This architecture is more suitable for business-to-consumer type dot.coms such as bookstores, in which the promotional content is mixed with the customer-tailored content. A typical book search in an online retail bookshop will provide promotional offers and discounts using the results of the search that a user made. Table 3 describes the criteria that lead to the two implementations of the view layer.

    Table 3: Design options to implement view layer
    View layer using Java classes View layer using Java Server Pages
    Application-centric (examples: B2B, business transaction between partners) Web page-centric (examples: B2C, search page in an online bookstore)
    User is aware of content and layout (example: business transaction follows agreed-upon standards) User not aware of content or layout (exam ple: online bookstore search results include promotions of the day)
    Change in content and layout go through some change process Change in content and layout is somewhat instantaneous
    More application development resources available (example: Java developers) More Web page development resources available (examples: HTML, JavaScript, Web designers)
    Ownership of application by one team (example: application design team) Ownership of page area by multiple teams (examples: marketing, corporate, finance and application design teams responsible for respective areas)
    Source: Kamesh Namuduri and Palani Ram

    Enterprise JavaBeans (EJB) architecture is component-based. It is a popular architecture today, provided the business can afford it. Using this type of architecture, a dot.com organization can buy components from various vendors and set up the dot.com architecture very quickly. EJB containers and servers handle transactions. EJB architecture encourages developers to focus on pure business logic by relieving them of technical responsibilities that are irrelevant to the business problem. For example, the EJB container handles user authentication. Authorization to invoke EJB methods is controlled by the deployer afterwards. Other advantages include portability, scalability, automatic state persistence and declarative transaction demarcation (see Fig. 5). J2EE is an EJB-based architecture that includes several other components to provide a component-based, scalable, distributed computing environment.

    Figure 5
    Namuduri Figure 5
    Servlet and EJB component-based architecture allows developers to focus on pure business logic by automating some technical tasks that are irrelevant to the business problem.

    Clusters and geographically dispersed data centers
    To accommodate growth, the architecture needs to be scaled appropriately. Scaling may require expanding the business and data tiers to include more Web and database servers to accommodate growing business needs. Clustering may be used as a solution for this purpose.

    Several smaller, slower and less expensive machines can be clustered together to gain enough processing power to handle the workload in the middle tier. Clusters also improve the availability of servers because other machines can do the processing if a member of the cluster goes offline. Another benefit of clustered servers is that computing capacity can be added incrementally without disrupting clients.

    More database servers may be added in the data tier to meet the demands of customer growth. The database servers may also be geographically distributed for convenience and improved performance. For example, the databases in the southern and northern regions of an organization might represent customer data in each of those regions. A business also may have multiple databases to store and retrieve different types of data such as human resources and inventory. Data replication is another reason for maintaining multiple copies of databases. Data replication increases the availability of the enterprise systems.

    Enterprise integration
    Few companies have the opportunity to start building their dot.com architectures from scratch. If a large portion of an organization's business intelligence is encapsulated in existing legacy applications, utilizing the capability of these systems and integrating this capability with the new architecture is efficient and cost-effective. During the design, it is important to identify the role of each legacy system in the integrated architecture in terms of business logic and data storage. An appropriate interface between the legacy system and the new architecture then needs to be built based on the specific requirements. Available technologies for legacy integration include TCP sockets, CORBA, JDBC, message brokers and so on. In a servlet-centric approach, servlets can be used to communicate with the enterprise applications and database servers. In an EJB-centric architecture, EJBs provide a component-based approach for integrating legacy enterprise applications.

    As the e-business gets more market share, the company not only has to integrate its own applications, but also those of its business partners. Hence, the dot.com architecture has to include B2B integration tools. Application integration technologies such as CORBA and RMI are useful for B2B integration as well. Well-designed e-business architecture takes advantage of industry-accepted standards and tools to protect an organization's investments in existing information infrastructure. The XML standard makes the business data exchange among businesses simpler. All major software vendors are supporting this standard. Recently, Sun Microsystems added three programming interfaces to Java Technology supporting XML: the Java API for XML Messaging, the Java API for XML Processing and the Java API for Data Binding.

    Dot.coms exist in a state of constant change as their business grows. By designing the dot.com architecture using industry-accepted, open standards, the architecture can be scaled in proportion to its demand. Component-based development facilitates enhancements to one or more modules without affecting the rest of the architecture. The adoption of J2EE technology by several vendors certifies that J2EE has the features desirable for scalable dot.com architecture.

    J2EE components
    Java servlets and Java Server Pages (JSPs)
    Servlets and JSPs are request/response-based mechanisms on the server side acting as the gateway to the enterprise system.

    Java Database Connectivity (JDBC)
    JDBC provides the access to tabular data source such as RDBMS, spreadsheets and flat files.

    Java Remote Method Invocation (RMI) and RMI-Internet Inter-Orb Protocol (IIOP)
    RMI provides communication to other JVM processes in a distributed computing framework. RMI-IIOP lets Java processes communicate with Common Object Request Broker Architecture (CORBA)-based distributed computing resources.

    Enterprise JavaBeans (EJBs)
    EJB architecture provides a component-based, server-side middleware development environment. EJB specification provides the contract between the components and the containers-EJB servers. The servers provide middleware services such as transaction, security and database connectivity. EJB component developers can concentrate on the business logic.

    Java Naming and Directory Interface (JNDI)
    JNDI identifies the location of components or other resources across the network.

    Java Transaction API (JTA) and Java Transaction Services (JTS)
    JTA and JTS provide interfaces between a transaction manager and the parties involved in a distributed transaction system: the resource manager, the application server and the transactional applications.

    Java Messaging Service (JMS)
    JMS provides and asynchronous communication framework for distributed resources.

    Java IDL
    IDL provides CORBA capability, providing standard-based interoperability and connectivity.

    JavaMail
    JavaMail provides E-mail capabilities.

    Connectors
    Connectors provide the integration capability for production enterprise systems.

    eXtensible Markup Language (XML)
    XML provides a standard framework for marking information so that it can easily be exchanged and reinterpreted across networks.

    Source: Kamesh Namuduri and Palani Ram