Tuesday, October 21, 2014

JGroups 3.6.0.Final released

I just released 3.6.0.Final to SourceForge [1] and Nexus. It contains a few new features, but mostly optimizations and a few bug fixes. It is a small release before starting work on the big 4.0.

A quick look over what was changed:


New features


CENTRAL_LOCK: lock owners can be node rather than individual threads

[https://issues.jboss.org/browse/JGRP-1886]

Added an option to make the node the lock owner rather than the combo node:thread. This was needed by the JGroups clustering plugin for vert.x.


RSVP: option to not block caller

[https://issues.jboss.org/browse/JGRP-1851]

This way, a caller knows that its message will get resent, but it doesn't have to block. Also added code to skip RSVP if the message was unicast and UNICAST3 was used as protocol (only used for multicast messages).


Docker image for JGroups

[https://issues.jboss.org/browse/JGRP-1840]

mcast: new multicast diagnosis tool

[https://issues.jboss.org/browse/JGRP-1832]




Optimizations


UNICAST3 / NAKACK2: limit the max size of retransmission requests

[https://issues.jboss.org/browse/JGRP-1868]

When we have a retransmission request in UNICAST3 or NAKACK2 for a large number of messages, then the size of the retransmit message may become larger than what the transport can handle, and is therefore dropped. This leads to endless retransmissions.

The fix here is to retransmit only the oldest N messages such that the retransmit message doesn't exceed the max bundling size of the transport and to use a bitmap to identify missing messages to be retransmitted.

Also, the bitmaps used by SeqnoList reduce the size of a retransmission message.


Channel creation has unneeded synchronization

[https://issues.jboss.org/browse/JGRP-1887]

Slowing down parallel creation of many channels; this was removed.


UNICAST3: sync receiver table with sender table

[https://issues.jboss.org/browse/JGRP-1875]

In some edge cases, a receiver would not sync correctly with a sender.



Bug fixes





JMX: unregistering a channel which didn't register any protocols issues warnings

[https://issues.jboss.org/browse/JGRP-1894]


UDP: ip_ttl was ignored and is always 1

[https://issues.jboss.org/browse/JGRP-1880]


MERGE3: in some cases, the information about subgroups is incorrect

[https://issues.jboss.org/browse/JGRP-1876]

The actual MergeView was always correct, but the information about subgroups wasn't.


RELAY2 doesn't work with FORK

[https://issues.jboss.org/browse/JGRP-1875]


Enjoy !


[1] http://sourceforge.net/projects/javagroups/files/JGroups/3.6.0.Final/

Monday, October 06, 2014

JGroups and Docker

I've uploaded an image with JGroups and a few demos to DockerHub [2].

The image is based on Fedora 20 and contains JGroups, plus a few scripts which run a chat demo, a distributed lock demo and a distributed counter demo.

To run this, it's as simple as executing

docker run -it belaban/jgroups

This will drop you into a bash shell and you're ready to run any of the three demos.

Start multiple containers and you have a cluster in which you can try out things, e.g. a cluster node acquiring a lock, another node trying to acquire it and blocking, the first node crashing and the second node finally acquiring the lock etc.

Note that this currently runs a cluster on one physical box. I'll still need to investigate what needs to be done to run Docker containers on different hosts and cluster them [3].

The Dockerfile is at [1] and can be used to derive another image from it, or to build the image locally.

[1] https://github.com/belaban/jgroups-docker

[2] https://registry.hub.docker.com/u/belaban/jgroups/

[3] https://issues.jboss.org/browse/JGRP-1840

Friday, August 29, 2014

JGroups 3.5.0.Final released

I'm happy to announce that JGroups 3.5.0.Final has been released !

It's quite a big release with 137 issues resolved.

Downloads at the usual place (SourceForge) or via Maven:

<dependency>
    <groupId>org.jgroups</groupId>
    <artifactId>jgroups</artifactId>
    <version>3.5.0.Final</version>
</dependency>

 

The release notes are here.

Enjoy !

Bela Ban




 











Wednesday, July 30, 2014

New record for a large JDG cluster: 500 nodes

Today I ran a 500 node JDG (JBoss Data Grid) cluster on Google Compute Engine, topping the previous record of 256 nodes. The test used is IspnPerfTest [2] and it does the following:
  • The data grid has keys [1..20000], values are 1KB byte buffers
  • The cache is DIST-SYNC with 2 owners for every key
  • Every node invokes 20'000 requests on random keys
  • A request is a read (with an 80% chance) or a write (with a 20% chance)
    • A read returns a 1KB buffer
    • A write takes a 1KB buffer and sends it to the primary node and the backup node
  • The test invoker collects the results from all 500 nodes and computes an average which it prints to stdout.
But see for yourself; the YouTube video is here: [1].

In terms of cluster size, I could probably have gone higher: it was pretty effortless to go to 500 instances... any takers ? :-)


[1] http://youtu.be/Cz3CDr31EDU
[2] https://github.com/belaban/IspnPerfTest

Wednesday, July 23, 2014

Running a JGroups cluster in Google Compute Engine: part 3 (1000 nodes)

A quick update on my GCE perf tests:

I successfully ran a 1000 node UPerf cluster on Google Compute Engine, the YouTube video is at http://youtu.be/Imn1M7EUTGE. The perf tests are started at around 14'13".

I also managed to get 2286 nodes running, however didn't record it... :-)

Monday, April 07, 2014

Running JGroups on Google Compute Engine


I recently started looking into Google Compute Engine (GCE) [1], an IAAS service similar to Amazon's EC2. I had been looking into EC2 a few years ago and created S3_PING back then, to simplify running JGroups clusters on EC2.

As GCE seems to be taking off, I wanted to be able to provide a simple way to run JGroups clusters on GCE, too. Of course, this also means it's simple to run Infinispan/JDG and WildFly(JBoss EAP) clusters on GCE.

So the first step was creating a discovery protocol named GOOGLE_PING.

Turns out this was surprisingly easy; only 27 lines of code as I could more or less reuse S3_PING. Google provides an S3 compatibility mode which only requires a change of the cloud storage host name !

Next, I wanted to run a number of UPerf instances on GCE and measure performance.

UPerf mimics Infinispan's partial replication, in which every node picks a backup for its data: every node invokes 20'000 synchronous RPCs, 80% of those are READS and 20% WRITES. Random destinations are picked for every request and when all members are done, the times to invoke the 20'000 requests are collected from all members, averaged and printed to the console.

The nice thing about UPerf is that parameters (such as the payload of each request, the number of invoker threads, the number of RPCs etc) can be changed dynamically; this is done in the form of a CONFIG RPC. All members apply the changes and therefore don't need to be restarted. (New members acquire the entire configuration from the coordinator).

I made 2 videos showing how this is done. Part 1 [3] shows how to setup and configure JGroups to run on 2 node cluster, and part 2 [4] shows a 100 node cluster and measures performance of UPerf. I hope to add a part 3 later, when I have a quota to run 1000 cores...

The first part can be seen here:

The second part is here:




Enjoy !

[1] https://cloud.google.com/products/compute-engine/

[2] https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/GOOGLE_PING.java

[3] http://youtu.be/xq7JxeIQTrU

[4]  http://youtu.be/fokCUvB6UNM

Friday, January 24, 2014

JGroups: status and outlook

For our clustering team meeting in Palma, I wrote a presentation on status and future of JGroups. As there isn't anything confidential in it, I thought I might as well share it.
Feedback to the users mailing list, please.
Cheers,

https://github.com/belaban/workshop/blob/master/slides/status2014.adoc