Tuesday, June 17, 2008

Clustering patterns

As usually is the case in most of the software problems we have identified patterns which solve the most common known problems. Similarly we have patterns for know problems in case of availability and performance as well.

There are 2 well known patterns for clustering which i would discuss in this post

1) Load Balanced Clustering

Any application server runs on a piece of software which is running on a piece of hardware. Which basically means that the server definitely has a performance threshold after which the performance would fall below expectations of the client. Even if we take the world's most advanced hardware it would have a limit to number of requests which it can server for a given piece of software. Once the number of requests increase over this number the server has to be scaled up (Not possible in this case as we already are on most advanced piece of hardware) or scaled out (Feasible option). To scale out an application running on such hardware we can add some more servers and cluster them using a load balancer.

The load balancer would act as the virtual resource to the clients and redirect the requests to appropriate physical servers lying behind it. So the network would look like the following...





The load balancer can delegate the requests to various servers based on various algorithms like
a) Round Robin :- Every one gets equal amount of requests
b) Weighted Round Robin :- A certain weightage given to servers based on hardware configuration by the administrator. The Load Balancer can decide based on these weightages to redirect the incoming requests to appropriate server
c) Least Connection :- Whichever server is serving the least connections gets the next request
d) Load Based :- Whichever server has the least load gets the next request

Normally a load balancer is intelligent enough to detect the failures of a particular server in the network and can stop sending requests to the failed node.
One of the interesting thing which goes down to the level of application design if we plan to deploy our application on a load balanced cluster is the state management. Normally applications store the Session information in the container provided objects (Session object in ASP.NET), now by default these session objects reside in the memory of the application server on which the request has been processed. In case the next request goes to a different server node in the cluster that node will have no clue about the session state stored in the memory of a different server in the cluster and important data might be lost in this case.
There can be 3 solutions to this problem
a) Use a external state server or state service to store the session object
b) Use an algorithm on the load balancer such that a request coming in the same session context always gets redirected to the same node in the cluster. (Server Affinity)
c) Ask each server to broadcast the session state in the whole . (Asynchronous Session state management ). This would be a cheap solution however with lot of network chaos inside the cluster.

Now one of the important questions which remains is to define what exactly is this load balancer. Basically it can be a piece of Software (Installed on one of the servers in the cluster) or Hardware (Special routers with intelligence for load balancing) which act as the gateway to the external world.
This kind of clustering is a good choice to meet the non functional requirements for high availability and Scalability

2) Failover Clustering

Some applications have a major requirement in terms of availability. e.g. the application can't go down even for an upgrade/patching. In such a case we can go for a configuring a failover cluster. In this pattern the idea is that a standby server is waiting to take over the primary server in case it goes down due to some reasons.
To detect that the server has gone down there can be 2 ways
a) Pull Heartbeat :- The standby server keeps checking the availability of primary server after a specified interval. In case it finds that the sever is not responding it assumes that the primary server is down and takes over the job and starts working on requests.
b) Push Heartbeat :- In this case the primary server keeps telling the standby server that its up. If the secondary server doesn't receive the heartbeat for a specified period of time it takes over the job and starts processing the requests.

In either cases before the standby server starts processing the requests it needs to synchronize with the primary server in order to be in exact same state so as to honour any open sessions/ transactions. Here are the strategies which can be used to do the same
a) Transaction Log :- Everytime the state of the primary server changes it logs the change in a transaction log. The log is synchronized with the standby server periodically and it brings itself into the same state as primary server. As soon as secondary server finds out that it has to take over it synchronizes with the latest transaction log so as to come into the same state in which primary server was before it went down. Now it is ready to take over and serve the requests....
b) Hot Standby :- In this strategy any change in the state of primary server is immediately sent to the secondary server to copy. The advantage is that as soon as the primary goes down the secondary can take over without any delay.
c) Shared Storage :- The state of both the servers is maintained on a external storage device. So it is as good as Hot Standby. It can be more or less performant based on how we synchronize the state in Hot Standby case and how responsive the is the external device to store/retrieve the state.

Another important aspect of this pattern is determining the active server. If multiple servers in a cluster assume that they are the active servers then unexpected behaviors like deadlock and data corruption may occur.

It is very important to design the cluster in such a way that we do not loose the performance. Additionally it might be a bit costly since standby server is normally not used unless there's a failure of primary one.

Now that i have summarized the 2 clustering patterns in this post the next logical step would be to setup an IIS cluster for each pattern. It should result in a new post in this series :). Here's what i plan to do ....
1) Have 2 IIS Servers on a network
2) Install a single ASP.NET Hello World applcation on it
3) Configure a Failover cluster between the 2 IIS instances and test that it works
4) Configure a software load balancing cluster and test that it works (May need a load runner kind of tool to simulate many requests).

~Abhishek



No comments: