Thursday, September 08, 2016

Removing thread pools in JGroups 4.0

JGroups 3.x has 4 thread pools:
  • Regular thread pool: used for regular messages (default has a queue)
  • OOB thread pool: used for OOB messages (no queue)
  • Internal thread pool: used for JGroups internal messages only. The main raison d'etre for this pool was that internal messages such as heartbeats or credits should never get queued up behind other messages, and get processed immediately.
  • Timer thread pool: all tasks in a timer need to be executed by a thread pool as they can potentially block
Over time, I found out that - with most configurations - I had the queue in the regular thread pool disabled, as I wanted the pool to dynamically create new threads (up to max-threads) when required, and terminate them again after some idle time.

Hence the idea to club regular and OOB thread pools into one.

When I further thought about this, I realized that incoming messages could also be handled by a queue-less thread pool: by handling RejectedExecutionException thrown when the pool is full and simply spawning a new thread to process the internal message, so it wouldn't get dropped.

The same goes for timer tasks: a timer task (e.g. a message retransmission task) cannot get dropped, or retransmission would cease. However, by using the same mechanism as for internal messages, namely spawning a new thread when the thread pool is full, this can be solved.

Therefore, in 4.0 there's only a single thread pool handling regular, OOB and internal messages, plus timer tasks.

The new thread pool has no queue, or else it would not throw a RejectedExecutionException when a task is added, but simply queue it, which is not what we want for internal messages or timer tasks.

It also has a default rejection policy of "abort" which cannot be changed (only by substituting the thread pool with a custom pool).

This dramatically reduces configuration complexity: from 4 to 1 pools, and the new thread pool only exposes min-threads, max-threads, idle-time and enabled as configuration options.

Here's an example of a 3.x configuration:





and here's the 4.0 configuration:


Nice, isn't it?


Tuesday, August 23, 2016

JGroups 4.0.0.Alpha1 released

Howdy folks,

I just released a first alpha of JGroups 4.0.0 to SourceForge and maven. There are 12+ issues left, and I expect a final release in October.

The major version number was incremented because there are a lot of API changes and removal of deprecated classes/methods.

I'd be happy to get feedback on the API; there's still time to make a few API changes until the final release.

Major API changes

New features 


Deliver message batches

Ability to handle message batches in addition to individual messages: receive(MessageBatch) was added to ReceiverAdapter:

Encryption and authentication

ENCRYPT was removed and replaced by SYM_ENCRYPT (keystore-based) and ASYM_ENCRYPT (asymmetric key exchange):

ENCRYPT had a couple of security flaws, and since the code hasn't been touched for almost a decade, I took this chance and completely refactored it into the 2 protocols above.

The code is now much more readable and maintainable.



Removal of Java serialization

JGroups marshalling falls back to Java serialization if a class is not primitive or implements Streamable. This was completely removed for JGroups-internal classes:

MessageBundler improvements

The performance of existing bundlers has been improved, and new bundlers (RingBufferBundler, NoBundler) have been added. Since the shared transport feature was removed, SingletonAddress used as key into a bundler's hashmap could also be removed.

For instance, TransferQueueBundler now removes chunks of message from the queue, rather than one-by-one:

I also added the ability to switch bundlers at runtime (via (e.g.) op=TCP.bundler["rb"]): /

Reduction of event creation

For every message sent up or down, a new Event (wrapping the message) had to be created. Adding 2 separate up(Message) and down(Message) callbacks removed that unneeded creation:

Faster header marshalling and creation

Headers used to do a hashmap lookup to get their magic-id when getting marshalled. Now they carry the magic ID directly in the header. This saves a hashmap lookup per header (usually we have 3-4 headers per message):

Header creation was getting the class using a lookup with the magic ID, and then creating an instance. This was replaced by each header class implementing a create() method, resulting in header creation getting from 300ns down to 25ns.

Bug fixes

MFC and UFC headers getting mixed up

This is a regression that led to premature exhaustion of credits, slowing things down:



The manual has not been changed yet, so browse the source code or javadocs for details.

Enjoy! and feedback via the mailing list, please!


Bela Ban

Wednesday, February 24, 2016

JGroups 3.6.8 released

FYI, 3.6.8.Final has been released.

Not a big release; it contains mostly optimizations and a nice probe improvement. The main issues are listed below.


New features

Probe improvements

- Proper discarding of messages from a different cluster with '-cluster' option.
- Less information per cluster member; only the requested information is returned
- Detailed information about RPCs (number of sync, async RPCs, plus timings)


DONT_BUNDLE and OOB: messages are not removed from batch when execution fails

- Messages are not removed from batch when execution fails
- Rejections are not counted to num_rejected_msgs

COMPRESS: removed 1 byte[] buff copy

An unneeded copy of the compressed payload was created when sending and compressing a message. The fix should reduce
memory allocation pressure quite a bit.

RpcDispatcher: don't copy the first anycast

When sending an anycast to 3 destinations, JGroups sends a copy of the original message to all 3. However, the first
doesn't need to be copied (less memory allocation pressure). For an anycast to a single destination, no copy is
needed, either.

Compaction of in-memory size

- Reduced size of Rsp (used in every RPC) from 32 -> 24 bytes
- Request/UnicastRequest/GroupRequest: reduced size

RequestCorrelator.done() is slow

Used by RpcDispatcher. Fixed by eliminating the linear search done previously.

Bug fixes

FILE_PING: consider special characters in file names

Names like "A/web-cluster" would fail on Windows as the slash char ('/') was treated as demarcation char in some clouds.


The manual is at

The complete list of features and bug fixes can be found at

Bela Ban, Kreuzlingen, Switzerland
Feb 2016

Wednesday, January 20, 2016

Dump RPC stats with JGroups

When using remote procedure calls (RPCs) across a cluster using RpcDispatcher, it would be interesting to know how many RPCs of which type (unicast, multicast, anycast) are invoked by whom to whom.

I added this feature in 3.6.8-SNAPSHOT [1]. The documentation is here: [2].

As a summary, since this feature is costly, it has to be enabled with rpcs-enable-details (and disabled with rpcs-disable-details).

From now on, invocation times of synchronous (blocking) RPCs will be recorded (async RPCs will be ignored).

RPC stats can be dumped with rpcs-details:
[belasmac] /Users/bela/JGroups$ rpcs rpcs-details

-- sending probe on /
#1 (481 bytes):
local_addr=C [ip=, version=3.6.8-SNAPSHOT, cluster=uperf, 4 mbr(s)]
uperf: sync  multicast RPCs=0
uperf: async unicast   RPCs=0
uperf: async multicast RPCs=0
uperf: sync  anycast   RPCs=67480
uperf: async anycast   RPCs=0
uperf: sync  unicast   RPCs=189064
D: async: 0, sync: 130434, min/max/avg (ms): 0.13/924.88/2.613
A: async: 0, sync: 130243, min/max/avg (ms): 0.11/926.35/2.541
B: async: 0, sync: 63346, min/max/avg (ms): 0.14/73.94/2.221

#2 (547 bytes):
local_addr=A [ip=, version=3.6.8-SNAPSHOT, cluster=uperf, 4 mbr(s)]
uperf: sync  multicast RPCs=5
uperf: async unicast   RPCs=0
uperf: async multicast RPCs=0
uperf: sync  anycast   RPCs=67528
uperf: async anycast   RPCs=0
uperf: sync  unicast   RPCs=189200
<all>: async: 0, sync: 5, min/max/avg (ms): 2.11/9255.10/4917.072
C: async: 0, sync: 130387, min/max/avg (ms): 0.13/929.71/2.467
D: async: 0, sync: 63340, min/max/avg (ms): 0.13/63.74/2.469
B: async: 0, sync: 130529, min/max/avg (ms): 0.13/929.71/2.328

#3 (481 bytes):
local_addr=B [ip=, version=3.6.8-SNAPSHOT, cluster=uperf, 4 mbr(s)]
uperf: sync  multicast RPCs=0
uperf: async unicast   RPCs=0
uperf: async multicast RPCs=0
uperf: sync  anycast   RPCs=67255
uperf: async anycast   RPCs=0
uperf: sync  unicast   RPCs=189494
C: async: 0, sync: 130616, min/max/avg (ms): 0.13/863.93/2.494
A: async: 0, sync: 63210, min/max/avg (ms): 0.14/54.35/2.066
D: async: 0, sync: 130177, min/max/avg (ms): 0.13/863.93/2.569

#4 (482 bytes):
local_addr=D [ip=, version=3.6.8-SNAPSHOT, cluster=uperf, 4 mbr(s)]
uperf: sync  multicast RPCs=0
uperf: async unicast   RPCs=0
uperf: async multicast RPCs=0
uperf: sync  anycast   RPCs=67293
uperf: async anycast   RPCs=0
uperf: sync  unicast   RPCs=189353
C: async: 0, sync: 63172, min/max/avg (ms): 0.13/860.72/2.399
A: async: 0, sync: 130342, min/max/avg (ms): 0.13/862.22/2.338
B: async: 0, sync: 130424, min/max/avg (ms): 0.13/866.39/2.350

This shows the stats for each member in a given cluster, e.g. number of unicast, multicast and anycast RPCs, per target destination, plus min/max and average invocation times for sync RPCs per target as well.

Probe just become even more powerful! :-)

[2] Documentation:

Monday, January 18, 2016

JGroups workshop in Munich April 4-8 2016

I'm happy to announce another JGroups workshop in Munich April 4-8 2016 !

The registration is now open at [2].

The agenda is at [3] and includes an overview of the basic API, building blocks, advanced topics and an in-depth look at the most frequently used protocols, plus some admin (debugging, tracing,diagnosis) stuff.

We'll be doing some hands-on demos, looking at code and I'm always trying to make the workshops as hands-on as possible.

I'll be teaching the workshop myself, and I'm looking forward to meeting some of you and having beers in Munich downtown! For attendee feedback on courses last year check out [1].

Note that the exact location in Munich has not yet been picked, I'll update the registration and send out an email to already registered attendees once this is the case (by the end of January the latest).

The course has a min limit of 5 and a max limit of 15 attendees.

I'm planning to do another course in Boston or New York in the fall of 2016, but plans have not yet finalized.

Cheers, and I hope to see many of you in Munich!
Bela Ban


Tuesday, January 12, 2016

JGroups 3.6.7.Final released

I'm happy to announce that 3.6.7.Final has been released!

This release contains a few bug fixes, but is mainly about optimizations reducing memory consumption and allocation rates.
The other optimization was in TCP_NIO2, which is now as fast as TCP. It is slated to become the successor to TCP, as it uses fewer threads and since it's built on NIO, should be much more scalable.

3.6.7.Final can be downloaded from SourceForge [1] or used via maven (groupId=org.jgroups / artifactId=jgroups, version=3.6.7.Final).

Below is a list of the major issues resolved.


New features

Interoperability between TCP and TCP_NIO2

This allows nodes that have TCP as transport to talk to nodes that have TCP_NIO2 as transport, and vice versa.


Transport: reuse of receive buffers

On a message reception, the transport would create a new buffer in TCP and TCP_NIO2 (not in UDP), read the message into that buffer and then pass it to the one of thread pools, copying single messages (not batches).
This was changed to reusing the same buffers in UDP, TCP and TCP_NIO2, by reading the network data into one of those buffers, de-serializing the message (or message batch) and then passing it to one of the thread pools.
The effect is a much lower memory allocation rate.

Message bundling: reuse of send buffers

When sending messages, a new buffer would be created for marshalling for every message (or message bundle). This was changed to reuse the same buffer for all messages or message bundles.
The effect is a smaller memory allocation rate on the send path.

TCP_NIO2: copy on-demand when sending messages

If a message sent by TCP_NIO2 cannot be put entirely into the network buffer of the OS, then the remainder of that message is copied. This is needed to implement reusing of send buffers, see JGRP-1989 above.

TCP_NIO2: single selector slows down writes and reads

This transport used to have a single selector, processing both writes and reads in the same thread. Writes are not expensive, but reads can be, as de-serialization adds up.
We now have a reader thread for every NioConnection which processes reads (using work stealing) separate from the selector thread. When idle for some time, the reader thread terminates and a new thread is created on subsequent data available to be read.
UPerf (4 nodes) showed a perf increase from 15'000 msgs/sec/node to 24'000. TCP_NIO2's speed is now roughly the same as TCP.

Headers: collapse 2 arrays into 1

A Message had a Headers instance which had an array for header IDs and another one for the actual headers. These 2 arrays were collapsed into a single array and Headers is not a separate class anymore, but the array is managed directly inside Message.
This reduces the memory needed for a message by ca 22 bytes!

RpcDispatcher: removal of unneeded field in a request

The request-id was carried in both the Request (UnicastRequest or MulticastRequest) and the header, which is duplicate and a waste. Removed from Request and also removed rsp_expected from the header, total savings ca. 9 bytes per RPC.

Switched back from DatagramSocket to MulticastSocket for sending of IP multicasts

This caused some issues in MacOS based systems: when the routing table was not setup correctly, multicasting would not work (nodes wouldn't find each other).
Also, on Windows, IPv6 wouldn't work:

Make the default number of headers in a message configurable

The default was 3 (changed to 4 now) and if we had more headers, then the headers array needed to be resized (unneeded memory allocation).

Message bundling

When the threshold of the send queue was exceeded, the bundler thread would send messages one-by-one, leading to bad performance.

TransferQueueBundler: switch to array from linked list for queue

Less memory allocation overhead.

Bug fixes

SASL now handles merges correctly


FRAG2: message corruption when thread pools are disabled


Discovery leaks responses



The manual is at

The complete list of features and bug fixes can be found at