Wednesday, July 03, 2019

Compiling JGroups to native code with Quarkus/GraalVM

I'm happy to announce the availability of a JGroups extension for Quarkus!


Quarkus is a framework that (among other things) compiles Java code down to native code (using GraalVM [4]), removing code that's not needed at run time.

Quarkus analyzes the code in a build phase, and removes code that's not used at run time, in order to have a small executable that starts up quickly.

This means that reflection cannot be used at run time, as all classes that are not used are removed at build time. However, reflection can be used at build time.

The other limitations that affect JGroups are threads and the creation of sockets. Both cannot be done at build time, but have to be done at run time. (More limitations of JGroups under Quarkus are detailed in [5]).

So what's the point of a providing a JGroups extension for Quarkus?

While a JGroups application can be compiled directly to native code (using GraalVM's native-image), it is cumbersome, and the application has to be restructured (see [6] for an example) to accommodate the limitations of native compilation.

In contrast, the JGroups extension provides a JChannel that can be injected into the application. The channel has been created according to a configuration file and connected (= joined the cluster) by the extension. The extension takes care of doing the reflection, the socket creation and the starting of threads at the right time (build- or run-time), and the user doesn't need to worry about this.


So let's take a look at a sample application (available at [2]).

The POM includes the extension: groupId=org.jgroups.quarkus.extension and artifactId=quarkus-jgroups. This provides a JChannel that can be injected. The main class is ChatResource:

public class ChatResource extends ReceiverAdapter implements Publisher<String> {
    protected final Set<Subscriber<? super String>> subscribers=new HashSet<>();

    @Inject JChannel channel;

    protected void init(@Observes StartupEvent evt) throws Exception {
        System.out.printf("-- view: %s\n", channel.getView());

    protected void destroy(@Observes ShutdownEvent evt) {

    public String sendMessage(@PathParam("msg") String msg) throws Exception {
        channel.send(null, Objects.requireNonNull(msg).getBytes());
        return String.format("message \"%s\" was sent on channel \n", msg);

    public Publisher<String> greeting() {
        return this;

    public void receive(Message msg) {

    public void receive(MessageBatch batch) {
        for(Message msg: batch)

    public void viewAccepted(View view) {
        System.out.printf("-- new view: %s\n", view);

    public void subscribe(Subscriber<? super String> s) {
        if(s != null)

    protected void onNext(Message msg) {
        String s=new String(msg.getRawBuffer(), msg.getOffset(), msg.getLength());
        System.out.printf("-- from %s: %s\n", msg.src(), s);
        subscribers.forEach(sub -> sub.onNext(s));
It has a JChannel channel which is injected by Arc (the dependency mechanism used in Quarkus). This channel is fully created and connected when it is injected.

The receive(Message) and receive(MessageBatch) methods receive messages sent by itself or other members in the cluster. It in turn publishes them via the Publisher interface. All subscribers will therefore receive all messages sent in the cluster.

The sendMessage() method is invoked when a URL of the form http://localhost:8080/chat/send/mymessage is received. It takes the string parameter ("mymessage") and uses the injected channel to send it to all members of the cluster.

The URL http://localhost:8080/chat/subscribe (or http://localhost:8080/streaming.html in a web browser) can be used to subscribe to messages being received by the channel.


Let's now run a cluster of 2 instances: open 2 shells and type the following commands (output has been edited for brevity):

[belasmac] /Users/bela/quarkus-jgroups-chat$ mvn compile quarkus:dev
[INFO] --- quarkus-maven-plugin:0.18.0:dev (default-cli) @ quarkus-jgroups-chat ---
2019-07-03 14:12:05,025 DEBUG [org.jgr.qua.ext.JChannelTemplate] (main) creating channel based on config config=chat-tcp.xml, bind_addr=, initial_hosts=, cluster=quarkus-jgroups-chat

GMS: address=belasmac-19612, cluster=quarkus-jgroups-chat, physical address=
-- view: [belasmac-19612|0] (1) [belasmac-19612]

[belasmac] /Users/bela/quarkus-jgroups-chat$ mvn compile quarkus:dev -Dquarkus.http.port=8081
[INFO] --- quarkus-maven-plugin:0.18.0:dev (default-cli) @ quarkus-jgroups-chat ---
2019-07-03 14:15:02,463 DEBUG [org.jgr.qua.ext.JChannelTemplate] (main) creating channel based on config config=chat-tcp.xml, bind_addr=, initial_hosts=, cluster=quarkus-jgroups-chat

GMS: address=belasmac-25898, cluster=quarkus-jgroups-chat, physical address=
-- view: [belasmac-19612|1] (2) [belasmac-19612, belasmac-25898]

The system property quarkus.http.port=8081 is needed, or else there would be a port collision, as the default port of 8080 has already been taken by the first application (both apps are started on the same host).

The output shows that there's a cluster of 2 members.

We can now post a message by invoking curl http://localhost:8080/chat/send/hello%20world and curl http://localhost:8081/chat/send/message2.

Both shells show that they received both messages:
-- view: [belasmac-19612|1] (2) [belasmac-19612, belasmac-25898]
-- from belasmac-19612: hello world
-- from belasmac-25898: message2

Of course, we could also use a web browser to send the HTTP GET requests.

When subscribing to the stream of messages in a web browser (http://localhost:8081/streaming.html), this would look as follows:

Note that both channels bind to the loopback ( address. This can be changed by changing bind_addr and initial_hosts in



Uncomment the 2 properties and set them accordingly.

Alternatively, we can pass these as system properties, e.g.:

[belasmac] /Users/bela/quarkus-jgroups-chat$ mvn compile quarkus:dev -Dbind_addr= -Dinitial_hosts=[7800],[7801]
[INFO] --- quarkus-maven-plugin:0.18.0:dev (default-cli) @ quarkus-jgroups-chat ---
2019-07-03 14:38:28,258 DEBUG [org.jgr.qua.ext.JChannelTemplate] (main) creating channel based on config config=chat-tcp.xml, bind_addr=, initial_hosts=, cluster=quarkus-jgroups-chat

GMS: address=belasmac-10738, cluster=quarkus-jgroups-chat, physical address=
-- view: [belasmac-10738|0] (1) [belasmac-10738]

Native compilation

To compile the application to native code, the mvn package -Pnative command has to be executed:

[belasmac] /Users/bela/quarkus-jgroups-chat$ mvn package -Pnative
[INFO] Building jar: /Users/bela/quarkus-jgroups-chat/target/quarkus-jgroups-chat-1.0.0-SNAPSHOT.jar
[INFO] --- quarkus-maven-plugin:0.18.0:build (default) @ quarkus-jgroups-chat ---
[INFO] [io.quarkus.deployment.QuarkusAugmentor] Beginning quarkus augmentation
[INFO] [org.jboss.threads] JBoss Threads version 3.0.0.Beta4
[INFO] [io.quarkus.deployment.QuarkusAugmentor] Quarkus augmentation completed in 1343ms
[INFO] [io.quarkus.creator.phase.runnerjar.RunnerJarPhase] Building jar: /Users/bela/quarkus-jgroups-chat/target/quarkus-jgroups-chat-1.0.0-SNAPSHOT-runner.jar
[INFO] --- quarkus-maven-plugin:0.18.0:native-image (default) @ quarkus-jgroups-chat ---
[INFO] [io.quarkus.creator.phase.nativeimage.NativeImagePhase] Running Quarkus native-image plugin on OpenJDK 64-Bit Server VM
[INFO] [io.quarkus.creator.phase.nativeimage.NativeImagePhase] /Users/bela/graalvm/Contents/Home/bin/native-image -J-Djava.util.logging.manager=org.jboss.logmanager.LogManager --initialize-at-build-time=$BySpaceAndTime -jar quarkus-jgroups-chat-1.0.0-SNAPSHOT-runner.jar -J-Djava.util.concurrent.ForkJoinPool.common.parallelism=1 -H:FallbackThreshold=0 -H:+ReportUnsupportedElementsAtRuntime -H:+ReportExceptionStackTraces -H:+PrintAnalysisCallTree -H:-AddAllCharsets -H:EnableURLProtocols=http -H:-SpawnIsolates -H:+JNI --no-server -H:-UseServiceLoaderFeature -H:+StackTrace
[quarkus-jgroups-chat-1.0.0-SNAPSHOT-runner:93574]    classlist:   6,857.25 ms
[quarkus-jgroups-chat-1.0.0-SNAPSHOT-runner:93574]        (cap):   4,290.72 ms
[quarkus-jgroups-chat-1.0.0-SNAPSHOT-runner:93574]        setup:   6,430.30 ms
14:43:05,540 INFO  [org.jbo.threads] JBoss Threads version 3.0.0.Beta4
14:43:06,468 INFO  [org.xnio] XNIO version 3.7.2.Final
14:43:06,528 INFO  [org.xni.nio] XNIO NIO Implementation Version 3.7.2.Final
[quarkus-jgroups-chat-1.0.0-SNAPSHOT-runner:93574]   (typeflow):  17,331.26 ms
[quarkus-jgroups-chat-1.0.0-SNAPSHOT-runner:93574]    (objects):  24,511.12 ms
[quarkus-jgroups-chat-1.0.0-SNAPSHOT-runner:93574]   (features):   1,194.16 ms
[quarkus-jgroups-chat-1.0.0-SNAPSHOT-runner:93574]     analysis:  44,204.65 ms
[quarkus-jgroups-chat-1.0.0-SNAPSHOT-runner:93574]     (clinit):     579.00 ms
[quarkus-jgroups-chat-1.0.0-SNAPSHOT-runner:93574]     universe:   1,715.40 ms
[quarkus-jgroups-chat-1.0.0-SNAPSHOT-runner:93574]      (parse):   3,315.80 ms
[quarkus-jgroups-chat-1.0.0-SNAPSHOT-runner:93574]     (inline):   4,563.11 ms
[quarkus-jgroups-chat-1.0.0-SNAPSHOT-runner:93574]    (compile):  24,906.58 ms
[quarkus-jgroups-chat-1.0.0-SNAPSHOT-runner:93574]      compile:  34,907.28 ms
[quarkus-jgroups-chat-1.0.0-SNAPSHOT-runner:93574]        image:   4,557.78 ms
[quarkus-jgroups-chat-1.0.0-SNAPSHOT-runner:93574]        write:   2,531.16 ms
[quarkus-jgroups-chat-1.0.0-SNAPSHOT-runner:93574]      [total]: 109,858.54 ms
[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:58 min
[INFO] Finished at: 2019-07-03T14:44:40+02:00

This uses GraalVM's native-image to generate a native executable. After a while, the resulting executable is in the ./target directory:

It's size is only 27MB and we can see that it is a MacOS native executable:
[belasmac] /Users/bela/quarkus-jgroups-chat/target$ ls -lh quarkus-jgroups-chat-1.0.0-SNAPSHOT-runner
-rwxr-xr-x  1 bela  staff    27M Jul  3 14:44 quarkus-jgroups-chat-1.0.0-SNAPSHOT-runner
[belasmac] /Users/bela/quarkus-jgroups-chat/target$ file quarkus-jgroups-chat-1.0.0-SNAPSHOT-runner
quarkus-jgroups-chat-1.0.0-SNAPSHOT-runner: Mach-O 64-bit executable x86_64

To run it:

[belasmac] /Users/bela/quarkus-jgroups-chat/target$ ./quarkus-jgroups-chat-1.0.0-SNAPSHOT-runner

GMS: address=belasmac-55106, cluster=quarkus-jgroups-chat, physical address=
-- view: [belasmac-55106|0] (1) [belasmac-55106]

When you run this yourself, you will notice that quick startup time of the second and subsequent members. Why not the first member? The first member has to wait for GMS.join_timeout millis (defined in chat-tcp.xml) to see if it discovers any other members, and so it always runs into this timeout.

To change bind_addr and initial_hosts, has to be changed before compiling to native code.


The quarkus-jgroups extension depends on JGroups-4.1.2-SNAPSHOT, which it may not find unless a snapshot repository has been added to the POM (or settings.xml). Alternatively, git clone ; cd JGroups ; mvn install will generate and install this artifact in your local maven repo.

Currently, only TCP is supported (chat-tcp.xml). UDP will be supported once MulticastSockets are properly supported by GraalVM (see [5] for details).

For some obscure reason,

had to be enabled in the POM, or else native compilation would fail. I hope to remove this once I understand why...


This was a relatively quick port of JGroups to native code. For feedback and questions please use the JGroups mailing list.

The following things are on my todo list for further development:
  • Provide more JGroups classes via extensions, e.g. RpcDispatcher (to make remote method calls)
  • Provide docker images with native executables
  • Implement support for UDP
  • Trim down the size of the executable even more



Wednesday, June 05, 2019

Network sniffing

Oftentimes, JGroups/Datagrid users capture traffic from the network for analysis. Using tools such as wireshark or tshark, they can look at UDP packets or TCP streams.

There used to be a wireshark plugin, written by Richard Achmatowicz, but since it was written in C, every time the wire format changed, the C code had to be changed, too. It is therefore not maintained any longer.

However, there's a class in JGroups that can be used to read messages from the wire: ParseMessages. Since it uses the the same code that's reading messages off the wire, it can always parse messages in the version it's shipped with. It is therefore resistant to wire format changes.

In 4.1.0, I changed ParseMessages to be more useful:
  • Reading of TCP streams is now supported
  • It can read packets from stdin (ideal for piping from tshark)
  • Handling of binary data (e.g. from a PCAP capture) is supported
  • Views are parsed and displayed (e.g. in VIEW or JOIN response messages)
  • Logical names can be displayed: instead of {node-2543, node-2543} instead of {3673e687-fafb-63e0-2ff1-67c0a8a6f8eb,312aa7da-f3d5-5999-1f5c-227f6e43728e}
 To demonstrate how to use this, I made 4 short videos:
  1. Capture UDP IPv4 traffic with tshark
  2. Capture TCP IPv6 traffic with tshark
  3. Capture with tcpdump and wireshark
The documentation is here.

Happy network sniffing!

Tuesday, May 21, 2019

4.1.0 released

I'm happy to announce that I just released JGroups 4.1.0!

Why the bump to a new minor version?

I had to make some API changes and - as I'm trying to avoid that in micro releases - I decided to bump the version to 4.1.0. The changes involve removing a few (rather exotic) JChannel constructors, but chances are you've never used any of them anyway. The other change is a signature change in Streamable, where I now throw IOExceptions and ClassNotFoundExceptions instead of simple Exceptions.

Here's a list of the major changes:

GraalVM / Quarkus support

  • JGroups can now be compiled into a native executable using GraalVM's native-image. This is a very cool feature; I've used ProgrammaticUPerf2 to start a member in ~1 millisecond!
  • This means JGroups can now be used by other applications to create native binaries. Not yet very polished, and I'll write a Quarkus extension next, but usable by folks who know GraalVM...
  • I'll blog about the port to GraalVM and the Quarkus extension (once it's ready) next, so stay tuned!

 Parsing of network packets

Diagnostics handler without reflection

  • This is related to the GraalVM port: the default DiagnosticsHandler (used by uses reflection, which is not allowed in GraalVM
  • This additional DiagnosticsHandler can be used instead of the default one when creating a native binary of an application. The advantage is that still works, even in a native binary.

RpcDispatcher without reflection

 Probe: support when running under TCP

  • Also required when running on GraalVM: since JGroups currently only supports TCP (MulticastSockets don't currently work on GraalVM), needs to be given the address of *one* member, to fetch information about all members

 Change in how IPv4/IPv6 addresses are picked

  • The new algorithm centers around bind_addr defined in the transport (UDP, TCP, TCP_NIO2); the value of this address determines how other addresses (such as loopback, site_local, global, localhost or default values) are resolved.
  • Example: if bind_addr=::1, then all other addresses that are not explicitly defined will be IPv6. If bind_addr=, the all other addresses will be IPv4 addresses.
  • The advantage of this is that you can run a JGroups stack using IPv4 and another one using IPv6 in the same process!
A complete list of JIRA issues is at
Please post questions/issues to the mailing list at!forum/jgroups-dev.


Thursday, March 14, 2019


FYI: I just released 4.0.19.Final.

This changes the way ASYM_ENCRYPT disseminates the private shared group key to members, going from a pull- to a push-based approach [1] [2].

In combination with JGRP-2293 [3], this should help a lot in environments like Kubernetes or Openshift where pods with JGroups nodes are started/stopped dynamically, and where encryption is required.

The design is at [3].

Check it out and let me know if you run into issues.


Wednesday, February 13, 2019

and 4.0.17

Sorry for the short release cycle between 4.0.16 and 4.0.17, but there was an important feature in 4.0.17, which I wanted to release as soon as possible: JGRP-2293.

This has better support for concurrent graceful leaves of multiple members (also coordinators), which is important in cloud environments, where pods are started and stopped dynamically by Kubernetes.

Not having this enhancement would lead to members leaving gracefully, but that would sometimes not be recognized and failure detection would have to kick in to exclude those members and installing the correct views. This would slow down things, and the installing of new views would be goverened by the timeouts in the failure detection protocols (FD_ALL, FD_SOCK). On top of this, in some edge cases, MERGE3 would have to kick in and fix views, further slowing things down.

There's a unit test [1] which tests the various aspects of concurrent leaving, e.g. all coordinators leaving sequentially, multiple members leaving concurrently etc

I recommend installing this version as soon as possible, especially if you run in the cloud. Questions / problems etc -> [2]




Monday, January 21, 2019

JGroups 4.0.16

I just released 4.0.16. The most important features/bug fixes are (for a comprehensive list see [1]):



Wednesday, January 03, 2018

Injecting a split brain into a JGroups cluster

Thanks to a contribution by Andrea Tarocchi, we can now inject arbitrary views into a JGroups cluster, generating split brain scenarios.

This is done by a new protocol INJECT_VIEW, which can be placed anywhere in the stack, e.g. like this:

  <MERGE3 max_interval="30000"

  <VERIFY_SUSPECT timeout="500"  />


The protocol exposes a method injectView(), which can be called by Let's say we have a cluster {A,B,C}: we can now inject view {A} into A, and {B,C} into B and C, as follows: op=INJECT_VIEW.injectView["A=A;B=B/C;C=B/C"]

(Note that '/' is used to separate node names; this is needed, because ',' is used to seperate arguments by default in probe) .

The result is that node A is now its own singleton cluster (not seeing B and C), and B and C only see each other (but not A).

Note that the member names are the logical names given to individual nodes.

Once the split has been injected, MERGE3 should remedy it by merging A, B and C back into a single cluster, after between 10-30 seconds (in the above configuration).

Thanks to Andrea for creating this nifty tool to experiment with cluster splits!

Monday, August 28, 2017

JGroups workshops in Rome and Berlin in November

I'm happy to announce that I've revamped the JGroups workshop and I'll be teaching 2 workshops in November: Rome Nov 7-10 and Berlin Nov 21-24.

The new workshop is 4 days (TUE-FRI). I've updated the workshop to use the latest version of JGroups (4.0.x), removed a few sections and added sections on troubleshooting/diagnosis and handling of network partitions (split brain).

Should be a fun and hands-on workshop!

To register and for the table of contents, navigate to [1].

To get the early-bird discount, use code EARLYBIRD.



Monday, June 26, 2017

Non-blocking JGroups

I'm happy to announce that I just released JGroups 4.0.4.Final!

It's prominent feature is non-blocking flow control [1,2].

Flow control makes sure that a fast sender cannot overwhelm a slow receiver in the long run (short spikes are tolerated) by adjusting the send rate to the receive rate. This is done by giving each sender credits (number of bytes) that it is allowed to send until it has to block.

A receiver sends new credits to the sender once it has processed a message.

When no credits are available anymore, the sender blocks until it gets new credits from the receiver(s). This is done in UFC (unicast flow control) and MFC (multicast flow control).

Flow control and TCP have been the only protocols that could block (UDP and TCP_NIO2 are non-blocking).

Non-blocking flow control now adds protocols UFC_NB and MFC_NB. These can replace their blocking counterparts.

Instead of blocking sender threads, the non-blocking flow control protocols queue messages when not enough credits are available to send them, allowing the sender threads to return immediately.

When fresh credits are received, the queued messages will be sent.

The queues are bound to prevent heap exhaustion; setting attribute max_queue_size (bytes) will queue messages up to max_queue_size bytes and then block subsequent attempts to queue until more space is available. Of course, setting max_queue_size to a large value will effectively make queues unbounded.

Using MFC_NB / UFC_NB with a transport of UDP or TCP_NIO2, which also never block, provides a completely non-blocking stack, where sends never block a JChannel.send(Message). If RpcDispatcher is used, there are non-blocking methods to invoke (even synchronous) RPCs, ie. the ones which return a CompletableFuture.

Other features of 4.0.4 include a new message bundler and a few bug fixes (e.g. the internal thread pool was not shut down). For the list of all JIRAs see [3].





Monday, May 08, 2017

Running an Infinispan cluster with Kubernetes on Google Container Engine (GKE)

In this post, I'm going to show the steps needed to get a 10 node Infinispan cluster up and running on Google Container Engine (GKE).

The test we'll be running is IspnPerfTest and the corresponding docker image is belaban/ispn_perf_test on dockerhub.

All that's needed to run this youselves is a Google Compute Engine account, so head on over there now and create one if you want to reproduce this demo! :-)

Alternatively, the cluster could be run locally in minikube, but for this post I chose GKE instead.

Ready? Then let's get cracking...

First, let's create a 10-node cluster in GKE. The screen shot below shows the form that needs to be filled out to create a 10 node cluster in GKE. This results in 10 nodes getting created in Google Compute Engine (GCE):

As shown, we'll use 10 v16-cpu instances with 14GB of memory each.

Press "Create" and the cluster is being created:

If you logged into your Google Compute Engine console, it would show the 10 nodes that are getting created.

When the cluster has been created, click on "Connect" and execute the "gcloud" command that's shown as a result:

gcloud container clusters get-credentials ispn-cluster \ --zone us-central1-a --project ispnperftest

We can now see the 10 GCE nodes:
[belasmac] /Users/bela/kubetest$ kubectl get nodes
NAME                                          STATUS    AGE       VERSION
gke-ispn-cluster-default-pool-59ed0e14-1zdb   Ready     4m        v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-33pk   Ready     4m        v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-3t95   Ready     4m        v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-5sn9   Ready     4m        v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-9lmz   Ready     4m        v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-j646   Ready     4m        v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-k797   Ready     4m        v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-q80q   Ready     4m        v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-r96s   Ready     4m        v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-zhdj   Ready     4m        v1.5.7

Next we'll run an interactive instance of IspnPerfTest and 3 non-interactive (no need for a TTY) instances, forming a cluster of 4. First, we start the interactive instance. Note that it might take a while until Kubernetes has downloaded image belaban/ispn_perf_test from dockerhub. When done, the following is shown:

[belasmac] /Users/bela/kubetest$ kubectl run infinispan --rm=true -it --image=belaban/ispn_perf_test
If you don't see a command prompt, try pressing enter.

GMS: address=infinispan-749052960-pl066-27417, cluster=default, physical address=

GMS: address=infinispan-749052960-pl066-18029, cluster=cfg, physical address=
created 100,000 keys: [1-100,000]
[1] Start test [2] View [3] Cache size [4] Threads (100)
[5] Keys (100,000) [6] Time (secs) (60) [7] Value size (1.00KB) [8] Validate
[p] Populate cache [c] Clear cache [v] Versions
[r] Read percentage (1.00)
[d] Details (true)  [i] Invokers (false) [l] dump local cache
[q] Quit [X] Quit all

Now we'll start 3 more instances, the definition of which is taken from a Yaml file (ispn.yaml):

## Creates a number of pods on kubernetes running IspnPerfTest
## Run with: kubectl create -f ispn.yaml
apiVersion: extensions/v1beta1
kind: Deployment
  name: ispn
  namespace: default
  replicas: 3
        run: ispn-perf-test
      hostNetwork: false
      - args:
        - -nohup
        name: ispn-perf-test
        image: belaban/ispn_perf_test
        # imagePullPolicy: IfNotPresent

The YAML file creates 3 instances (replicas: 3):

belasmac] /Users/bela/kubetest$ kubectl create -f ispn.yaml
deployment "ispn" created

Running "kubectl get pods -o wide" shows:

[belasmac] /Users/bela/kubetest$ kubectl get pods -o wide
NAME                         READY     STATUS    RESTARTS   AGE       IP          NODE
infinispan-749052960-pl066   1/1       Running   0          4m   gke-ispn-cluster-default-pool-59ed0e14-5sn9
ispn-1255975377-hm785        1/1       Running   0          46s   gke-ispn-cluster-default-pool-59ed0e14-r96s
ispn-1255975377-jx70d        1/1       Running   0          46s   gke-ispn-cluster-default-pool-59ed0e14-1zdb
ispn-1255975377-xf9r8        1/1       Running   0          46s   gke-ispn-cluster-default-pool-59ed0e14-3t95

This shows 1 infinispan instance (interactive terminal)  and 3 ispn instances (non-interactive).

We can now exec into one of the instances are run to verify that a cluster of 4 has formed:

[belasmac] /Users/bela/kubetest$ kubectl exec -it ispn-1255975377-hm785 bash
bash-4.3$ -addr localhost                                                                                                                                         
-- sending probe request to /
-- sending probe request to /
-- sending probe request to /
-- sending probe request to /

#1 (287 bytes):
view=[infinispan-749052960-pl066-27417|3] (4) [infinispan-749052960-pl066-27417, ispn-1255975377-xf9r8-60537, ispn-1255975377-jx70d-16, ispn-1255975377-hm785-39319]
version=4.0.3-SNAPSHOT (Schiener Berg)

#2 (287 bytes):
view=[infinispan-749052960-pl066-27417|3] (4) [infinispan-749052960-pl066-27417, ispn-1255975377-xf9r8-60537, ispn-1255975377-jx70d-16, ispn-1255975377-hm785-39319]
version=4.0.3-SNAPSHOT (Schiener Berg)

#3 (284 bytes):
view=[infinispan-749052960-pl066-27417|3] (4) [infinispan-749052960-pl066-27417, ispn-1255975377-xf9r8-60537, ispn-1255975377-jx70d-16, ispn-1255975377-hm785-39319]
version=4.0.3-SNAPSHOT (Schiener Berg)

#4 (292 bytes):
view=[infinispan-749052960-pl066-27417|3] (4) [infinispan-749052960-pl066-27417, ispn-1255975377-xf9r8-60537, ispn-1255975377-jx70d-16, ispn-1255975377-hm785-39319]
version=4.0.3-SNAPSHOT (Schiener Berg)

4 responses (4 matches, 0 non matches)

The 'view' item shows the same view (list of cluster members) across all 4 nodes, so the cluster has formed successfully. Also, if we look at our interactive instance, pressing [2] shows the cluster has 4 members:

-- local: infinispan-749052960-pl066-18029
-- view: [infinispan-749052960-pl066-18029|3] (4) [infinispan-749052960-pl066-18029, ispn-1255975377-xf9r8-34843, ispn-1255975377-jx70d-20952, ispn-1255975377-hm785-48520]

This means we have view number 4 (0-3) and 4 cluster members (the number in parens).

Next, let's scale the cluster to 10 members. To do this, we'll tell Kubernetes to scale the ispn deployment to 9 instances (from 3):

[belasmac] /Users/bela/IspnPerfTest$ kubectl scale deployment ispn --replicas=9
deployment "ispn" scaled

After a few seconds, the interactive terminal shows a new view containing 10 members:

** view: [infinispan-749052960-pl066-27417|9] (10) [infinispan-749052960-pl066-27417, ispn-1255975377-xf9r8-60537, ispn-1255975377-jx70d-16, ispn-1255975377-hm785-39319, ispn-1255975377-6191p-9724, ispn-1255975377-1g2kx-5547, ispn-1255975377-333rl-13052, ispn-1255975377-57zgl-28575, ispn-1255975377-j8ckh-35528, ispn-1255975377-lgvmt-32173]

We also see a minor inconvenience when looking at the pods:

[belasmac] /Users/bela/jgroups-docker$ lubectl get pods -o wide
NAME                         READY     STATUS    RESTARTS   AGE       IP          NODE
infinispan-749052960-pl066   1/1       Running   0          13m   gke-ispn-cluster-default-pool-59ed0e14-5sn9
ispn-1255975377-1g2kx        1/1       Running   0          1m   gke-ispn-cluster-default-pool-59ed0e14-k797
ispn-1255975377-333rl        1/1       Running   0          1m   gke-ispn-cluster-default-pool-59ed0e14-9lmz
ispn-1255975377-57zgl        1/1       Running   0          1m   gke-ispn-cluster-default-pool-59ed0e14-q80q
ispn-1255975377-6191p        1/1       Running   0          1m   gke-ispn-cluster-default-pool-59ed0e14-5sn9
ispn-1255975377-hm785        1/1       Running   0          10m   gke-ispn-cluster-default-pool-59ed0e14-r96s
ispn-1255975377-j8ckh        1/1       Running   0          1m   gke-ispn-cluster-default-pool-59ed0e14-j646
ispn-1255975377-jx70d        1/1       Running   0          10m   gke-ispn-cluster-default-pool-59ed0e14-1zdb
ispn-1255975377-lgvmt        1/1       Running   0          1m   gke-ispn-cluster-default-pool-59ed0e14-33pk
ispn-1255975377-xf9r8        1/1       Running   0          10m   gke-ispn-cluster-default-pool-59ed0e14-3t95

The infinispan pod  and one of the ispn pods have been created on the same GCE node gke-ispn-cluster-default-pool-59ed0e14-5sn9. The reason is that they are different deployments, and so GKE deploys them in an unrelated manner. Had all pods been created in the same depoyment, Kubernetes would have assigned pods to nodes in a round-robin fashion.

This could be fixed by using labels, but I didn't want to complicate the demo. Note that running more that one pod per GCE node will harm performance slightly...

Now we can run the performance test by populating the cluster (grid) with key/value pairs ([p]) and then running the test ([1]). This inserts 100'000 key/value pairs into the grid and executes read operations on every node for 1 minute (for details on IspnPerfTest consult the Github URL given earlier):

 Running test for 60 seconds:
1: 47,742 reqs/sec (286,351 reads 0 writes)
2: 56,854 reqs/sec (682,203 reads 0 writes)
3: 60,628 reqs/sec (1,091,264 reads 0 writes)
4: 62,922 reqs/sec (1,510,092 reads 0 writes)
5: 64,413 reqs/sec (1,932,427 reads 0 writes)
6: 65,050 reqs/sec (2,341,846 reads 0 writes)
7: 65,517 reqs/sec (2,751,828 reads 0 writes)
8: 66,172 reqs/sec (3,176,344 reads 0 writes)
9: 66,588 reqs/sec (3,595,839 reads 0 writes)
10: 67,183 reqs/sec (4,031,168 reads 0 writes)

done (in 60020 ms)

all: get 1 / 1,486.77 / 364,799.00, put: 0 / 0.00 / 0.00

======================= Results: ===========================
ispn-1255975377-1g2kx-51998: 100,707.90 reqs/sec (6,045,193 GETs, 0 PUTs), avg RTT (us) = 988.79 get, 0.00 put
ispn-1255975377-xf9r8-34843: 95,986.15 reqs/sec (5,760,705 GETs, 0 PUTs), avg RTT (us) = 1,036.78 get, 0.00 put
ispn-1255975377-jx70d-20952: 103,935.58 reqs/sec (6,239,149 GETs, 0 PUTs), avg RTT (us) = 961.14 get, 0.00 put
ispn-1255975377-j8ckh-11479: 100,869.08 reqs/sec (6,054,263 GETs, 0 PUTs), avg RTT (us) = 987.95 get, 0.00 put
ispn-1255975377-lgvmt-26968: 104,007.33 reqs/sec (6,243,144 GETs, 0 PUTs), avg RTT (us) = 960.05 get, 0.00 put
ispn-1255975377-6191p-15331: 69,004.31 reqs/sec (4,142,053 GETs, 0 PUTs), avg RTT (us) = 1,442.04 get, 0.00 put
ispn-1255975377-57zgl-58007: 92,282.75 reqs/sec (5,538,903 GETs, 0 PUTs), avg RTT (us) = 1,078.14 get, 0.00 put
ispn-1255975377-333rl-8583: 99,130.95 reqs/sec (5,949,542 GETs, 0 PUTs), avg RTT (us) = 1,004.08 get, 0.00 put
infinispan-749052960-pl066-18029: 67,166.91 reqs/sec (4,031,358 GETs, 0 PUTs), avg RTT (us) = 1,486.77 get, 0.00 put
ispn-1255975377-hm785-48520: 79,616.70 reqs/sec (4,778,196 GETs, 0 PUTs), avg RTT (us) = 1,254.87 get, 0.00 put

Throughput: 91,271 reqs/sec/node (91.27MB/sec) 912,601 reqs/sec/cluster
Roundtrip:  gets min/avg/max = 0/1,092.14/371,711.00 us (50.0=938 90.0=1,869 95.0=2,419 99.0=5,711 99.9=20,319 [percentile at mean: 62.50]),
            puts n/a

As suspected, instances infinispan-749052960-pl066-18029 and ispn-1255975377-6191p-15331 show lower performance than the other nodes, as they are co-located on the same GCE node.

The Kubernetes integration in JGroups is done by KUBE_PING which interacts with the Kubernetes master (API server) to fetch the IP addresses of the pods that have been started by Kubernetes.

The KUBE_PING protocol is new, so direct problems, issues, configuration questions etc to the JGroups mailing list.


Tuesday, February 21, 2017

JGroups 4.0.0.Final

I'm happy to announce that JGroups 4.0.0.Final is out!

With 120+ issues, the focus of this version is API changes, the switch to Java 8 and thus the use of new language features (streams, lambdas) and optimizations.

API changes

Fluent configuration

JChannel ch=new JChannel("config.xml").name("A").connect("cluster");

Removed deprecated classes and methods

Removed support for plain string-based channel configuration


Use of Java 8 features

E.g. replace Condition with Predicate etc

Removed classes

MessageDispatcher / RpcDispatcher changes

Use of CompletableFuture instead of NotifyingFuture

TCP: remove send queues

New features

Deliver message batches 


Receiver (ReceiverAdapter) now has an additional callback receive(MessageBatch batch). This allows JGroups to pass an entire batch of messages to the application rather than passing them up one by one.



Plus fixed security issues in the refactored code. Removed ENCRYPT.

Measure round-trip times for RPCs via probe


Keys 'rpcs' and 'rpcs-details' dump information about average RTTs between individual cluster members

Change message bundler at runtime


Message bundlers can be changed at runtime via probe. This is useful to see the effect of different bundlers on performance, even in the same test run.


DELIVERY_TIME: new protocol to measure delivery time in the application


Exposes stats via JMX and probe

RELAY2: sticky site masters


When we have multiple site masters, messages from the same member should always be handled by the same site master. This prevents reordering of messages at the receiver.

Multiple elements in bind_addr





This tries to bind to eth2 first, then to, then to an interface that starts with en0, and

finally to loopback.

Useful when running in an environment where the IP addresses and/or interfaces are not known before.

Multiple receiver threads in UDP


Multiple threads can receive (and process) messages from a datagram socket, preventing queue buildups.




RpcDispatcher: don't copy the first anycast


Reduction of memory size of classes

Remove one buffer copy in COMPRESS


Replace Java serialization with JGroups marshalling


Some internal classes still used Java serialization, which opens up security holes

(google 'java serialization vulnerability').

Faster marshalling / unmarshalling of messages

TCP: reduce blocking


Message bundler improvements


E.g. removal of SingletonAddress: not needed anymore as shared transports have been removed, too.

Protocol: addition of up(Message) and down(Message) callbacks


This massively reduces the number of Event creations.

TransferQueueBundler: remove multiple messages


Instead of removing messages one-by-one, the remover thread now removes as many messages as are in the queue (contention) into a local queue (no contention) and then creates and sends message batches off of the local queue.

Single thread pool


There's only a single thread pool for all typs of messages, reducing the maintenance overhead of 4 thread pools and the configuration required.

The internal thread pool is still available (size to the number of cores), but not configurable.
A ForkJoinPool can be used instead of the thread pool (which can be disabled as well).

Timer: tasks can indicate that they will not block


If a task calls execute() with an argument blocking==false, the task will be executed by the timer's main thread, and not be passed on to the timer's thread pool. This reduces the number of threads needed and therefore the number of context switches.

Headers are resized unnecessarily


ByteArrayDataOutputStream: expand more conservatively


Reading ints and longs creates unnecessary buffers


Ints and longs are read into a byte[] array first, then parsed. This was changed to read the values and add them to the resulting ints or longs.

Should reduce overall memory allocation as ints and longs are used a lot in headers.

Table.removeMany() creates unneeded temp list


This method is used a lot in NAKACK2 and UNICAST3. The change was to read messages directly into the resulting MessageBatch instead of a temp list and from there into the batch.

Reduce in-memory size of UnicastHeader3


Reduced size from 40 -> 32 bytes.

Cache result of log.isTraceEnabled()


This was done mainly for protocols where log.isTraceEnabled() was used a lot, such as TP, NAKACK2 or UNICAST3.

Note that the log level can still be changed at runtime.

Added MessageProcessingPolicy to define assigning of threads to messages or batches


This only applies only to regular (not OOB or internal) messages. Make sure that only one message per member is processed at a given time by the thread pool.

This reduces the number of threads needed.

UNICAST3 / NAKACK2: more efficient adding and removing of messages / batches to/from tables


Simpler algorithm and removal of one lock (= less contention)

Bug fixes

GMS sometimes ignores view bundling timeout


UFC and MFC headers get mixed up


Although indepent protocols, the protocol ID was assigned by the superclass, so replenish and credit messages would get mixed up, leading to stuttering in the sending of credits.

Flow control: replenish credits after message delivery


MERGE3: merge is never triggered


This is an edge case that was not covered before: every subgroup coordinator has some other member as coord:


MPING: restart fails


UNICAST3 drops all messages until it receives one with first==true


This caused a bug in ASYM_ENCRYPT.

ASYM_ENCRYPT: message batches are not handled correctly


SYM_ENCRYPT: allow for other keystores besides JCEKS


E.g. pcks#12 or jks

ASYM_ENCRYPT encrypts an empty buffer into a null buffer


This caused an NPE.


On Sourceforge:,
or via Maven:

The manual is at

The complete list of features and bug fixes can be found at

Bela Ban, Kreuzlingen, Switzerland
March 2017

Thursday, September 08, 2016

Removing thread pools in JGroups 4.0

JGroups 3.x has 4 thread pools:
  • Regular thread pool: used for regular messages (default has a queue)
  • OOB thread pool: used for OOB messages (no queue)
  • Internal thread pool: used for JGroups internal messages only. The main raison d'etre for this pool was that internal messages such as heartbeats or credits should never get queued up behind other messages, and get processed immediately.
  • Timer thread pool: all tasks in a timer need to be executed by a thread pool as they can potentially block
Over time, I found out that - with most configurations - I had the queue in the regular thread pool disabled, as I wanted the pool to dynamically create new threads (up to max-threads) when required, and terminate them again after some idle time.

Hence the idea to club regular and OOB thread pools into one.

When I further thought about this, I realized that incoming messages could also be handled by a queue-less thread pool: by handling RejectedExecutionException thrown when the pool is full and simply spawning a new thread to process the internal message, so it wouldn't get dropped.

The same goes for timer tasks: a timer task (e.g. a message retransmission task) cannot get dropped, or retransmission would cease. However, by using the same mechanism as for internal messages, namely spawning a new thread when the thread pool is full, this can be solved.

Therefore, in 4.0 there's only a single thread pool handling regular, OOB and internal messages, plus timer tasks.

The new thread pool has no queue, or else it would not throw a RejectedExecutionException when a task is added, but simply queue it, which is not what we want for internal messages or timer tasks.

It also has a default rejection policy of "abort" which cannot be changed (only by substituting the thread pool with a custom pool).

This dramatically reduces configuration complexity: from 4 to 1 pools, and the new thread pool only exposes min-threads, max-threads, idle-time and enabled as configuration options.

Here's an example of a 3.x configuration:





and here's the 4.0 configuration:


Nice, isn't it?


Tuesday, August 23, 2016

JGroups 4.0.0.Alpha1 released

Howdy folks,

I just released a first alpha of JGroups 4.0.0 to SourceForge and maven. There are 12+ issues left, and I expect a final release in October.

The major version number was incremented because there are a lot of API changes and removal of deprecated classes/methods.

I'd be happy to get feedback on the API; there's still time to make a few API changes until the final release.

Major API changes

New features 


Deliver message batches

Ability to handle message batches in addition to individual messages: receive(MessageBatch) was added to ReceiverAdapter:

Encryption and authentication

ENCRYPT was removed and replaced by SYM_ENCRYPT (keystore-based) and ASYM_ENCRYPT (asymmetric key exchange):

ENCRYPT had a couple of security flaws, and since the code hasn't been touched for almost a decade, I took this chance and completely refactored it into the 2 protocols above.

The code is now much more readable and maintainable.



Removal of Java serialization

JGroups marshalling falls back to Java serialization if a class is not primitive or implements Streamable. This was completely removed for JGroups-internal classes:

MessageBundler improvements

The performance of existing bundlers has been improved, and new bundlers (RingBufferBundler, NoBundler) have been added. Since the shared transport feature was removed, SingletonAddress used as key into a bundler's hashmap could also be removed.

For instance, TransferQueueBundler now removes chunks of message from the queue, rather than one-by-one:

I also added the ability to switch bundlers at runtime (via (e.g.) op=TCP.bundler["rb"]): /

Reduction of event creation

For every message sent up or down, a new Event (wrapping the message) had to be created. Adding 2 separate up(Message) and down(Message) callbacks removed that unneeded creation:

Faster header marshalling and creation

Headers used to do a hashmap lookup to get their magic-id when getting marshalled. Now they carry the magic ID directly in the header. This saves a hashmap lookup per header (usually we have 3-4 headers per message):

Header creation was getting the class using a lookup with the magic ID, and then creating an instance. This was replaced by each header class implementing a create() method, resulting in header creation getting from 300ns down to 25ns.

Bug fixes

MFC and UFC headers getting mixed up

This is a regression that led to premature exhaustion of credits, slowing things down:



The manual has not been changed yet, so browse the source code or javadocs for details.

Enjoy! and feedback via the mailing list, please!


Bela Ban