|
YOUR FEEDBACK
Did you read today's front page stories & breaking news?
SYS-CON.TV |
TOP THREE LINKS YOU MUST CLICK ON Migration "Server, Heal Thyself"
Autonomic features in WebSphere v6: Peer Recovery of Transactions
By: Dennis Ashby
Mar. 25, 2005 12:00 AM
At the forefront of IBM's autonomic computing strategy is WebSphere Application Server v6. This version of the application server is designed to automatically detect problems ranging from small network glitches to large-scale power failures and, in a matter of seconds, save and process Web-based business transac tions that could take hours or days to recover under older systems. This is referred to as "self-healing" software. As soon as the application server detects a problem, the transaction and related data is automatically moved to another server either within the same data center or, in the case of a power outage or disaster, to another geographic location. This new functionality is known as "peer recovery of transactions." This article will discuss how this works and how to configure it in WebSphere Application Server v6. High Availability Manager WebSphere's architecture for high-availability employs the use of a high-availability manager (HAM) to monitor key services provided by the application server. These services encompass messaging, transaction managers, workload management controllers and other application servers in the cluster. Today, many enterprise application architectures include the use of network-attached storage (NAS) devices. A requirement of peer recovery is that the HAM makes use of such a device for storage of transaction logs from each application server in the cluster. The HAM is responsible for automatic recovery of all in-flight as well as in-doubt work for any application server that fails in the high-availability cluster. This automatic functionality allows the WebSphere cluster to re-stabilize itself if one or more of the cluster members should fail. In WebSphere v6, all of the application servers in a cell are defined as members of a core group. A core group is a statically-defined set of application servers that can be divided up into various high-availability groups. Each core group has only one logical high-availability manager that continually polls all of the group members to verify they are active. It is also responsible for making the services within the core group available and scalable. WebSphere uses policy matching to localize and partition policy-driven work into high-availability groups. In other words, when a core group member (in this case, an application server) fails, the high-availability manager can dynamically reassign the failing member's work to another component from the same high-availability group (in this case, a healthy application server). Using NAS devices in the position of common logging facilities, the component which has been assigned the work can recover and process the in-doubt and in-flight work of the failed component. This is essentially the heart of peer recovery of transactions. Transaction Service Logs WebSphere v6 allows you to configure the location of the transaction log directory, either using the WebSphere administrative console or commands. Special logic has been added to the administrative console to facilitate the migration of the transaction log configuration from earlier versions of WebSphere. In older versions the transaction log directory configuration was stored in the server.xml server-level configuration file. In the current version it is stored in the serverindex.xml node-level configuration file. Prior to WebSphere v6, when an application server restarted from a crash, the server's recovery processes included retrieving the transaction service logs, processing the recorded information, recovering the transactional work and completing the in-doubt transactions (including the release of database locks). The server had to first completely restart and process the logs before the completion of the transactional work would take place. Thus, if the server was slow to recover or if it required manual intervention, the transactional work could not complete, and access to the database was disrupted. Depending on the industry, IBM points to estimates that the failure of an e-business application can cost a company as much as $110,000 per minute in lost revenue and productivity. To minimize the cost and disruption of a failure, WebSphere v6 employs the strategy known as peer recovery of transactions. Peer Recovery of Transactions The peer-recovery process is the logical equal to restarting the failed server, without actually doing so. It is important to note that the recovery process completes outstanding work, but does not start new work or provide "forward processing" functionality. The Workload Manager (WLM) of the cluster can then dispatch new work onto the remaining servers. Both transactions and the compensation service failover together to the same peer server. The only difference from the user's perspective is the potential drop in overall system throughput. Self-Healing Example During normal processing, let's suppose the S1 application server unexpectedly crashes. This will result in locks being held in the database. Subsequently, the application servers S2 and S3 are able to continue processing their existing transactions, but future transactional processing may be impeded due to the locks still held on behalf of crashed S1 application server. In WebSphere v6, once S1 had failed, a peer recovery process for S1 would begin running inside S3 (or S2 depending on the configuration). The transaction service portion of the recovery process would retrieve the information persisted by S1, and use that information to complete any in-doubt transactions. In addition, to recover any in-flight transactions, the endpoint references for S1 would be redirected to S3. Thus, the system would remain in a stable state with just two servers, between which the WLM engine can balance workload. At some future point when S1 is brought back online, it will have no recovery processing of its own to perform (see Figure 3). Configuring the transaction properties required for peer recovery is part of the overall task for configuring a cluster to use high availability support. Note that a cluster can house both v5 and v6 application servers; however, the peer recovery support is only available with clusters where all the members are v6 servers. To configure the transaction properties required for peer recovery, complete the following steps:
WEBSPHERE LATEST STORIES . . .
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
|
SYS-CON FEATURED WHITEPAPERS MOST READ THIS WEEK BREAKING WEBSPHERE NEWS
|
||||||||||||||||||||||||||||||||||