I'll be speaking at the Jazoon (in Zurich, June 24th, next week) and JBossWorld (in Chicago, Sept 2) conferences. At Jazoon, I'll talk about a memcached implementation in Java, at JBossWorld I'll talk about large JBoss clusters.
Hope to see some of you there !
Friday, June 19, 2009
Wednesday, June 17, 2009
Shunning has been shunned
I finally completed the MERGE4 functionality, which now handles asymmetrical merges and greatly improves the usefulness of JGroups in mobile networks. I've blogged about this earlier this year.
The new merge functionality also allowed me to trash shunning, which is great, because I've always had problems explaining the difference between shunning and merging. Merging would usually be needed when we had real network partitions, whereas shunning would be needed when only a single member was expelled from the group (e.g. because it failed to respond to heartbeats, but hasn't really crashed).
However, with FD_ALL, there could be a scenario where everybody shunned everybody else (a shun-fest :-)), and so all the cluster nodes would leave and re-join the cluster, possibly even multiple times. Clearly not a desirable scenario, even though it didn't lead to incorrect results !
The new model is now much simpler: we have members join, leave and merge. The latter happens on a network partition, for example. In the old model, when a member was unresponsive, it was shunned and subsequently rejoined. In the new model, there's simply going to be a merge between the group which found that member unresponsive and the (now newly responsive) member.
Since I also improved merging speed and correctness (wrt concurrent merges), I suggest download 2.8.beta2 (which I'll upload to SourceForge shortly) and give it a try.
One thing that I'll have to talk about (in my next post) is what to do with merging. For example, if we have shared state and it diverged during a network partition, how can the application make sure that the merge doesn't cause inconsistent states.
More on this later, enjoy,
The new merge functionality also allowed me to trash shunning, which is great, because I've always had problems explaining the difference between shunning and merging. Merging would usually be needed when we had real network partitions, whereas shunning would be needed when only a single member was expelled from the group (e.g. because it failed to respond to heartbeats, but hasn't really crashed).
However, with FD_ALL, there could be a scenario where everybody shunned everybody else (a shun-fest :-)), and so all the cluster nodes would leave and re-join the cluster, possibly even multiple times. Clearly not a desirable scenario, even though it didn't lead to incorrect results !
The new model is now much simpler: we have members join, leave and merge. The latter happens on a network partition, for example. In the old model, when a member was unresponsive, it was shunned and subsequently rejoined. In the new model, there's simply going to be a merge between the group which found that member unresponsive and the (now newly responsive) member.
Since I also improved merging speed and correctness (wrt concurrent merges), I suggest download 2.8.beta2 (which I'll upload to SourceForge shortly) and give it a try.
One thing that I'll have to talk about (in my next post) is what to do with merging. For example, if we have shared state and it diverged during a network partition, how can the application make sure that the merge doesn't cause inconsistent states.
More on this later, enjoy,