Wednesday, January 30, 2013

Buy one, get many for free: message batching in JGroups

Just a quick heads-up of what's going on in JGroups 3.3 with message batching.

Currently, when the transport receives a message bundle (say 20 messages), it passes the bundle to the regular thread pool (OOB messages are never bundled). The thread which handles the bundle grabs each message, adds it to the retransmission table, and then removes as many messages as possible and passes them up one-by-one.

So, although we have a message bundle, we don't process it as a bundle, but rather add each message individually and then pass them up up one-by-one.

Message batching [1] changes this. It reads a message bundle directly into 2 MessageBatch instances: one for OOB messages and one for regular messages. (This already shows that OOB messages are now bundled, too, but more on this later). The OOB MessageBatch is passed to the OOB thread pool for handling, the regular batch to the regular pool.

A message batch is nothing more than a list of messages.

A message batch is handled by only 1 thread: the thread passes the entire batch up the stack. Each protocol can remove messages it consumes (e.g. FD_ALL), change messages in-place (e.g. COMPRESS or ENCRYPT), remove and add messages (e.g. FRAG2), remove messages and pass up a new batch (NAKACK2, UNICAST2) or even do nothing (SIZE, STATS).

The advantage is that a protocol can now handle many messages at once, amortizing (e.g.) lock acquisition costs. For example, NAKACK2 adds all 20 messages to the retransmission table at once, thereby acquiring the table lock only once rather than 20 times. This means that we incur the cost of 1 lock acquition, instead of 20. It goes without saying that this will also reduce lock contention, at least in this particular case, even if the lock duration will be slightly longer than before.

I'll present some performance numbers soon, but so far preliminary performance tests look promising !

So while message bundling queues messages and sends them across the wire as a list, but stops at the receiver's transport; message batching takes this idea further and passes that bundle up all the way to the channel. (Note that this will require another receive() callback in the Receiver, but this will be transparent by default).

Message batching will allow other cool things to happen, e.g.
  • OOB messages will be bundled too now. If no bundling is desired, tag a message as DONT_BUNDLE.
  • We can simplify the message bundler (on the sender side), see [2]. As a result, I might even be able to remove all existing 4 message bundlers. As you know, I like removing stuff, and making code easier to read !
  • RPC responses can be bundled [3]
  • UNICAST2 can now ack the 'last message' [4]
Cheers,


[1] https://issues.jboss.org/browse/JGRP-1564

[2] https://issues.jboss.org/browse/JGRP-1540

[3] https://issues.jboss.org/browse/JGRP-1566

[4] https://issues.jboss.org/browse/JGRP-1548

Wednesday, January 02, 2013

SUPERVISOR: detecting faults and fixing them automatically

I've added a new protocol SUPERVISOR [1] to master, which can periodically check for certain conditions and correct them if necessary. This will be in the next release (3.3) of JGroups.

You can think of SUPERVISOR as an automated rule-based system admin.

SUPERVISOR was born out of a discussion on the mailing list [2] where a bug in FD caused the failure detection task in FD to be stopped, so members would not get suspected and excluded anymore. This is bad if the suspected member was the coordinator itself, as new members would not be able to join anymore !

Of course, fixing the bug [3] was the first priority, but I felt that it would be good to also have a second line of defense that detected problems in a running stack. Even if a rule doesn't fix the problem, it can still be used to detect it and alert the system admin, so that the issue can be fixed manually.

The documentation for SUPERVISOR is here: [4].


[1] https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/rules/SUPERVISOR.java

[2] https://sourceforge.net/mailarchive/message.php?msg_id=30218296

[3] https://issues.jboss.org/browse/JGRP-1559

[4] http://www.jgroups.org/manual-3.x/html/user-advanced.html#Supervisor



Cross site replication: demo on YouTube

FYI,

I've recently published a video on cross-site replication [1] on Youtube: [2].

The video shows how to set up and configure cross-site replication in Infinispan, although the focus of the video is on running the performance test [3].

Cheers, and a belated happy new year to everyone !


[1] https://docs.jboss.org/author/display/ISPN/Cross+site+replication

[2] https://www.youtube.com/watch?v=owOs430vLZo

[3] https://github.com/belaban/IspnPerfTest