Friday, March 11, 2011

A quick update on performance of JGroups 2.12.0.Final

I forgot to add performance data to the release announcement of 2.1.0.Final, so here it is.

Caveat: this is a quick check to see if we have a performance regression, which I run routinely before a release, and my no means a comprehensive performance test !

I ran this both on my home cluster and our internal lab.


This test is described in detail in [1]. It forms a cluster of 4 nodes, and every node sends 1 million messages of varying size (1K, 5K, 20K). We measure how long it takes for every node to receive the 4 million messages, and compute the message rate and throughput, per second, per node.

This is my home cluster and consists of 4 HP ProLiant DL380G5 quad core servers (ca 3700 bogomips), connected to a GB switch, and running Linux 2.6. The JDK is 1.6 and the heap size is 600M. I ran 1 process on every box. The configuration used was udp.xml (using IP multicasting) shipped with JGroups.

  •   1K message size: 140 MBytes / sec / node
  •   5K message size: 153 MBytes / sec / node
  • 20K message size: 154 MBytes / sec / node
 This shows that GB ethernet is saturated. The reason that every node receives more than the limit of GB ethernet (~ 125 MBytes/sec) is that every node loops back its own traffic, and therefore doesn't have to share it with other incoming packets. In theory, the max throughput should therefore be 4/3 * 125 ~= 166 MBytes/sec. We see that the numbers above are not too far away from this.


This test mimicks the way Infinispan's DIST mode works.

Again, we form a cluster of between 1 and 9 nodes. Every node is on a separate machine. The test then has every node invoke 2 unicast RPCs in randomly selected nodes. With a chance of 80% the RPCs are reads, and with a chance of 20% they're writes. The writes carry a payload of 1K, and the reads return a payload of 1K. Every node makes 20'000 RPCs.

The hardware is a bit more powerful than my home cluster; every machine has 5300 bogomips, and all machines are connected with GB ethernet.

  • 1 node:   50'000 requests / sec /node
  • 2 nodes: 23'000 requests / sec / node
  • 3 nodes: 20'000 requests / sec / node
  • 4 nodes: 20'000 requests / sec / node
  • 5 nodes: 20'000 requests / sec / node
  • 6 nodes: 20'000 requests / sec / node
  • 7 nodes: 20'000 requests / sec / node
  • 8 nodes: 20'000 requests / sec / node
  • 9 nodes: 20'000 requests / sec / node
As can be seen, the number of requests per node is the same after 2-3 nodes. The 1 node scenario is somewhat contrived as there is no network communication involved.

This is actually good news, as it shows that performance grows linearly. As a matter of fact, with increasing cluster size, the chances of more than 2 nodes picking the same target decreases, therefore performance degradation due to (write) access conflicts are likely to decrease.

Caveat: I haven't tested this on a larger cluster yet, but the current performance is already very promising.



  1. Interesting post. 2 questions from me:

    What's the CPU usage like at these levels, say on your home setup?

    How do these results compare to the previous version?



  2. Unfortunately, I didn't measure CPU load.

    Somre previous results are at [1]


  3. I can re-run the perf test if you want, and measure CPU load. Email me (belaban at yahoo dot com) or #jgroups @