Friday, February 10, 2012

JGroups 3.1.0.Alpha2 released

I'm happy to announce the release of JGroups 3.1.0.Alpha2 !

Don't be put off by the Alpha2 suffix; as a matter of fact, this release is very stable, and I might just go ahead and promote it to "Final" within a short time !

At the time of writing this, I still have a few issues open in 3.1, but because I think the current feature set is great, I might push them into a 3.2.

So what features and enhancements did 3.1 add ? In a nutshell:

  • A new protocol NAKACK2: this is a successor to NAKACK (which will get phased out over the next couple of releases). The 2 biggest changes are:
    • A new memory efficient data structure (Table) is used to store messages to be retransmitted. It can grow and shrink dynamically, and replaces NakReceiverWindow.
    • There is no Retransmitter associated with each table, and we don't create an entry *per* missing sequence number (seqno) or seqno range. Instead, we have *one* single retransmission task, which periodically (xmit_interval ms) scans through *all* tables, identifies gaps and triggers retransmission for missing messages. This is a significant code simplification and brings memory consumption down when we have missing messages.
  • Changes to UNICAST2 and UNICAST: in both cases, we switch from NakReceiverWindow  / AckSenderWindow / AckReceiverWindow to Table and instead of a retransmitter per member, we now have *one* retransmitter task for *all* members.
  • The changes in NAKACK2, UNICAST2 and UNICAST have several benefits:
    • Code simplification: having only one data structure (Table) instead of several ones (NakReceiverWindow, AckSenderWindow, AckReceiverWindow), plus removing all Retransmitter implementations leads to simpler code.
    • Code reduction: several classes can be removed, making the code base simpler to understand, and reducing complexity
    • Better maintainability: Table is now an important core data structure, and improvements to it will affect many parts of JGroups
    • Smaller memory footprint: especially for larger clusters, having less per-member data (e.g. retransmission tasks) should lead to better scalability in large clusters (e.g. 1000 nodes).
    • Smooth transition: we'll leave NAKACK (and therefore NakReceiverWindow and Retransmitter) in JGroups for some releases. NAKACK / NakReceiverWindow have served JGroups well for over a decade, and are battle-tested. When there is an issue with NAKACK2 / Table in  production, we can always fall back to NAKACK. I intend to phase out NAKACK after some releases and a good amount of time spent in production around the world, to be sure NAKACK2 works well
  • MERGE3: merging is frequent in large clusters. MERGE3 handles merging in large clusters better by
    • preventing (or reducing the chances of) concurrent merges
    • reducing traffic caused by merging
    • disseminating {UUID/physical address/logical name} information, so every node has this information, reducing the number of times we need to ask for it explicitly.
    • MERGE3 was written with UDP as transport in mind (which is the transport recommended for large clusters anyway), but it also works with TCP. 
  • Synchronous messages: they  block the sender until the receiver or receivers have ack'ed its delivery. This allows for 'partial flushing' in the sense that all messages sent by a member P prior to M will get delivered at all receivers before delivering M.
    This is related to FLUSH, but less costly and can be done per message. For example, if a unit of work is done, a sender could send an RSVP tagged message M and would be sure that - after the send() returns - all receivers have delivered M.
    To send an RSVP marked messages, Message.setFlag(Message.Flag.RSVP) has to be used.
    A new protocol (RSVP) needs to be added to the stack. See the documentation (link below) for details.
  • A new rackspace-based discovery protocol
  • Concurrent joining to a non-existing cluster is faster
  • Elimination (or reduction of) "no physical address for X; dropping message" warnings
  • Elimination of global JGroups ThreadGroup leaks
  • Elimination of socket leaks with TCPPING
The full list of changes is at [1], the manual can be found at [2] and 3.1 can be downloaded from [3].

Feedback is appreciated on the mailing lists, enjoy !


[1] https://github.com/belaban/JGroups/blob/master/doc/ReleaseNotes-3.1.0.txt
[2] http://www.jgroups.org/manual-3.x/html/index.html
[3] https://sourceforge.net/projects/javagroups/files/JGroups/3.1.0.Alpha2/