Showing posts with label Clustering. Show all posts
Showing posts with label Clustering. Show all posts

Friday, March 12, 2010

WebSphere Cluster Member crashed

Have you ever been asked this question in the interview?
how do you find out which cluster member was crashed/down?
The general answer we give is to go to administration console and check the individual server status or the cluster member status.
The other option is to use a third-party monitoring tool such as ITCAM, wily introscope, UniCenter and Nagios etc..
Have you ever checked the system.out log file of any individual server when one of the cluster member was stopped?
WebSphere has Distribution & Consistency Services (DCS), which is a part of the HA architecture. Using these DCS messages we can find which member of the cluster is down.
Here is an example:


I’ve a cell with name Test-Cell, which has a cluster with 6nodes each having 2 servers.
I’ve stopped one of cluster members. Then if you see the System.Out log file, you see message similar to the below:
[3/3/10 18:00:37:758 CET] 00000026 RoleMember    W   DCSV8104W: DCS Stack DefaultCoreGroup.TestRepln at Member Test-Cell\node01\server01: Removing member [Test-Cell\node02\server02] because the member was requested to be removed  by member Test-Cell\node02\server01. Internal details VL suspects others: CC-Situation Normal
[3/3/10 18:00:38:176 CET] 00000023 VSyncAlgo1    I   DCSV2004I: DCS Stack DefaultCoreGroup at Member Test-Cell\node01\server01: View synchronization completed successfully. The View Identifier is (22898:0.Test-Cell\node02\server01). The internal details are None.
[3/3/10 18:00:38:207 CET] 00000023 VSyncAlgo1    I   DCSV2004I: DCS Stack DefaultCoreGroup.TestRepln at Member Test-Cell\node01\server01: View synchronization completed successfully. The View Identifier is (331:0.Test-Cell\node02\server01). The internal details are None.
[3/3/10 18:00:38:537 CET] 00000024 CoordinatorIm I   HMGR0218I: A new core group view has been installed. The core group is DefaultCoreGroup.
[3/3/10 18:00:39:228 CET] 00000026 DataStackMemb I   DCSV8050I: DCS Stack DefaultCoreGroup.TestRepln at Member Test-Cell\node01\server01: New view installed, identifier (332:0.Test-Cell\node02\server01), view size is 11 (AV=11, CD=12, CN=12, DF=12)
[3/3/10 18:00:39:343 CET] 00000021 DRSBuddyManag A   CWWDR0006I:  Replication instance terminated : Test-Cell\node02\server02

So, from the above messages, it is clear that server02 of Node02 was down and is removed from the coregroup.
After some troubleshooting/changes, i started the server which was down earlier. Now, if you observe the SystemOut.log, you can see the following:
[3/3/10 18:17:13:245 CET] 00000026 RoleMember    I   DCSV8051I: DCS Stack DefaultCoreGroup.TestRepln at Member Test-Cell\node01\server01: Core group membership set changed. Added: [Test-Cell\node02\server02].
[3/3/10 18:17:13:315 CET] 00000023 MbuRmmAdapter I   DCSV1032I: DCS Stack DefaultCoreGroup.TestRepln at Member Test-Cell\node01\server01: Connected a defined member Test-Cell\node02\server02.
[3/3/10 18:17:30:337 CET] 00000023 VSyncAlgo1    I   DCSV2004I: DCS Stack DefaultCoreGroup.TestRepln at Member Test-Cell\node01\server01: View synchronization completed successfully. The View Identifier is (333:0.Test-Cell\node02\server01). The internal details are None.
[3/3/10 18:17:30:353 CET] 00000026 DataStackMemb I   DCSV8050I: DCS Stack DefaultCoreGroup.TestRepln at Member Test-Cell\node01\server01: New view installed, identifier (334:0.Test-Cell\node02\server01), view size is 12 (AV=12, CD=12, CN=12, DF=12)
[3/3/10 18:17:30:354 CET] 00000027 DRSBuddyManag A   CWWDR0007I:  Replication instance group membership changed: Test-Cell\node02\server02
[3/3/10 18:17:30:356 CET] 00000027 DRSBuddyManag A   CWWDR0002I: Replication instance is active : Test-Cell\node02\server02
[3/3/10 18:17:30:358 CET] 00000010 ViewReceiver  I   DCSV1033I: DCS Stack DefaultCoreGroup.TestRepln at Member Test-Cell\node01\server01: Confirmed all new view members in view identifier (334:0.Test-Cell\node02\server01). View channel type is View|Ptp.
You can a meesage which is showing that it added a new member to the coregroup.

About DCS:
There are two main versions of DCS: Core DCS and Data DCS. There is one Core DCS per process and it provides membership services among peer processes. These processes together form a Core Group. A process may be a member in one or more named Core Groups. Applications running on these processes can be members of application groups. Application groups are subsets of a particular named core group. A Data DCS component can be associated with each member of an application group.
DCS provides a mechanism for communicating information (distribution) among members with a given quality of service. Failure detection mechanisms that support and allow guaranteed quality of service are an inherent part of DCS and its services. DCS supports WebSphere components’ state replication requirements (like http session and stateful beans) as well as the distribution and synchronization of WebSphere artifacts for performance, scalability, and availability.
I’ll soon write about ‘Core Groups” of WebSphere to understand the DCS and high availability architecture of the WebSphere.

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

Friday, February 26, 2010

wsadmin script for creating cluster and adding cluster members

You can use this Jython script to create cluster and add members to it

AdminTask.createCluster('[-clusterConfig [-clusterName samplecluster] -replicationDomain [-createDomain true]]')

AdminTask.createClusterMember('[-clusterName samplecluster -memberConfig [-memberNode dmgrNode01 -memberName SampleMember1 -memberWeight 2 -replicatorEntry true] -firstMember [-templateName "Sample Template" -nodeGroup DefaultNodeGroup -coreGroup DefaultCoreGroup]]')

AdminTask.createClusterMember('[-clusterName samplecluster -memberConfig [-memberNode sunpatilNode02 -memberName SampleMember2 -memberWeight 2 -replicatorEntry true]]')


This script creates samplecluster by calling AdminTask.createCluster command and then creates SampleMember1 as cluster member using AdminTask.createClusterMember call. The AdminTask.createClusterMember is made for adding additional cluster members

Managing cluster member


Managing cluster member

After creating a cluster you can add members to it by following these steps

  • In the WAS Admin Console, go to Servers - cluster - cluster_name

  • On the cluster configuration page, expand the Cluster Member section like this
  • Click on the details button to get cluster members listing. You can either add or remove cluster members from this page

  • I want to create additional server on SunpatilNode02, so i did enter necessary information on the next page
  • On the summary page click finish and it starts adding cluster member

  • Once the cluster member is added you can verify the updated cluster topology


Creating cluster


Creating cluster

Clusters consist of one or more application servers grouped together for Work load management. All the members in the cluster will have same set of application and similar configuration.

The way cluster creation works is you need to have one server which will act as Template server or you can create first cluster server based on existing template and then cluster creation process will take care of creating other servers based on the first server. Other servers can be on same machine (Vertical scaling) or on different machine(Horizontal scaling)

I wanted to create a cluster in which one server instance will be on the same machine as DMGR and other server will be on a separate machine. I already created a custom profile on separate machine and federated it to the DMGR so my node agent on machine 2 is ready.

  • Now go to the WAS Admin Console and Server - Cluster and click on New

  • On the first page it will ask you for Cluster Name so i entered Sample Cluster i did check the Configure HTTP Session memory-to-memory replication because i want distributed session support
On the next page it will ask you information for the first cluster, This member will act as template for all other clusters. In my case instead of choosing existing server i choose to create new Server based on Sample Template
 
 On this page it will ask us to add additional cluster member i want to create one server on sunpatilNode2 so i selected that node and set SampleMember2 as server name
 
On the summary page just click finish so that the cluster creation process can start

 

It will take few minutes but once the cluster is created you can look at the details of the cluster by going to Server - Cluster - cluster_name tab.


 
You can take a look at the topology of the cluster by going to Local topology tab