Belas Blog: Clustering between different sites / geopgraphic failover

Tuesday, November 30, 2010

Clustering between different sites / geopgraphic failover

I just completed a new feature in JGroups which allows for transparent bridging of separate clusters, e.g. at different sites.

Let's say we have a (local) cluster in New York (NYC) and another cluster in San Francisco (SFO). They're completely autonomous, and can even have completely different configurations.

RELAY [1] essentially has the coordinators of the local clusters relay local traffic to the remote cluster, and vice versa. The relaying (or bridging) is done via a separate cluster, usually based on TCP, as IP multicasting is typically not allowed between sites.

SFO could be a backup of NYC, or both could be active, or we could think of a follow-the-sun model where each cluster is active during working hours at its site.

If we have nodes {A,B,C} in NYC and {D,E,F} in SFO, then there would be a global view, e.g. {D,E,F,A,B,C}, which is the same across all the nodes of both clusters.

One use of RELAY could be to provide geographic failover in case of site failures. Because all of the data in NYC is also available in SFO, clients can simply fail over from NYC to SFO if the entire NYC site goes down, and continue to work.

Another use case is to have SFO act as a read-only copy of NYC, and run data analysis functions on SFO, without disturbing NYC, and with access to almost real-time data.

As you can guess, this feature is going to be used by Infinispan, and since Infinispan serves as the data replication / distribution layer in JBoss, we hope to be able to provide replication / distribution between sites in JBoss as well...

Exciting times ... stay tuned for more interesting news from the Infinispan team !

Read more on RELAY at [1] and provide feedback !
Cheers,

[1] http://www.jgroups.org/manual/html/user-advanced.html#RelayAdvanced

14 comments:

Bela Ban2:13 PM
Forgot to say, JGroups 2.12.0.Alpha1 can be downloaded from [1]

[1] http://sourceforge.net/projects/javagroups/files/JGroups/2.12.0.Alpha1/jgroups-2.12.0.Alpha1.jar/download
ReplyDelete
Replies
Unknown2:48 PM
Great work !
ReplyDelete
Replies
Anonymous5:35 AM
Hi Bela -

Suppose we have a cluster of 4 nodes and 2 nodes are getting OS upgrade (RHEL5 in this case). All nodes are in the same subnet. When the 2 nodes are starting up after the OS upgrade - they are showing "Error installing to Start: name=HAPartition state=Create java.lang.IllegalStateException: Node xxxx could not flush the cluster for state retrieval" and so the deployments which are dependent on this HAPartition are in Error State. Can you point to where the problem is?

Thanks
Vishy
ReplyDelete
Replies
Anonymous8:30 AM
hi ben,
i was looking to create two clusters at 2 different sites.
but i don't know how to use RELAY.
can you please provide some examples about it.
ReplyDelete
Replies
Bela Ban8:38 AM
There's documentation: http://www.jgroups.org/manual/html/user-advanced.html#RelayAdvanced
ReplyDelete
Replies
Anonymous10:16 AM
can i have some examples for cluster communication
ReplyDelete
Replies
Anonymous11:33 AM
i gone through your example code given in demos.
i need to know what are the properties required to set run that demo(https://github.com/belaban/JGroups/blob/JGroups_2_12_0_Beta1/src/org/jgroups/demos/RelayDemo.java)
ReplyDelete
Replies
Bela Ban11:49 AM
Let's discuss this on the mailing list, not my blog !
ReplyDelete
Replies
Anonymous12:15 PM
i sent a mail using community.jboss.org....
please respond to that mail as earliest as possible
ReplyDelete
Replies
Anonymous1:15 PM
hi bela,

i'm newbie in RELAY API in JGroup. trying to execute RELAYdemo of yours from past 4-5 days.

i'm stuck in this.unable to run the demo successfully.

please provide the solution to the earliest.

thanks
ReplyDelete
Replies
Anonymous1:22 PM
hi bela
can you suggests the solution for this
"WARNING: discarded message from different cluster "RELAY_1" (our cluster is "RELAY_2")."
ReplyDelete
Replies
Yann Sionneau5:56 AM
awesome :)
But with the relay I suppose you do not keep the total ordering ...
ReplyDelete
Replies
Bela Ban7:06 AM
@Anonymous: I suggest subscribe to jg-users (https://sourceforge.net/mail/?group_id=6081) and post your questions there...

@Yann: it depends; if the destination cluster has a config that doesn't define total order, then you won't be able to keep total ordering. Maybe this wasn't clear, but the 2 clusters that are bridged do not *need* to have the same config !
If both clusters do have total order, then the following happens:
- The coordinator (relay) in cluster-1 receives messages M1 and M2
- It forwards M1 and M2 to cluster-2 via the bridge cluster (e.g. TCP-based)
- If the bridge cluster defines total ordering, it'll forward M1 and M2 in that order. If not, it could also forward M2 and M1...
- The coordinator (relay) of cluster-2 receives M1 and M2
- The coordinator will deliver M1 and M2 (in total order) to cluster-2
ReplyDelete
Replies
Bela Ban2:39 PM
My presentation "Geographic Failover" at JBossWorld 2011 is now online: http://www.vimeo.com/24825312
ReplyDelete
Replies

Add comment

Belas Blog

Tuesday, November 30, 2010

Clustering between different sites / geopgraphic failover

14 comments:

Contributors

Blog Archive