Belas Blog: October 2010

Friday, October 29, 2010

JGroups 2.11 final released

FYI,

2.11.0.final can be downloaded here. Its main features, optimizations and bug fixes are listed below.

I hope that 2.12 will be the last release before finally going to 3.0 !

2.12 should be very small, currently it contains only 8 issues (mainly optimizations).

However, I also moved RELAY from 3.x to 2.12.

RELAY allows for connecting geographically separate clusters into a large virtual cluster. This will be interesting to apps which need to provide geographic failover. More on this in the next couple of weeks...

Meanwhile ... enjoy 2.11 !

Bela, Vladimir & Richard

Release Notes JGroups 2.11
==========================

Version: $Id: ReleaseNotes-2.11.txt,v 1.2 2010/10/29 11:45:35 belaban Exp $
Author: Bela Ban

JGroups 2.11 is API-backwards compatible with previous versions (down to 2.2.7).

Below is a summary (with links to the detailed description) of the major new features.

New features
============

AUTH: pattern matching to prevent unauthorized joiners
------------------------------------------------------
[https://jira.jboss.org/browse/JGRP-996]

New plugin for AUTH which can use pattern matching against regular expressions to prevent unauthorized
IP addresses to join a cluster.

Blog: http://belaban.blogspot.com/2010/09/cluster-authentication-with-pattern.html

DAISYCHAIN: implementation of daisy chaining
--------------------------------------------
[https://jira.jboss.org/browse/JGRP-1021]

Daisy chaining sends messages around in a ring, improving throughput for non IP multicast networks.

Blog: http://belaban.blogspot.com/2010/08/daisychaining-in-clouds.html

New flow control protocols for unicast (UFC) and multicast (MFC) messages
-------------------------------------------------------------------------
[https://jira.jboss.org/browse/JGRP-1154]

MFC and UFC replace FC. They can be used independently, and performance is faster than that of FC only.

API for programmatic creation of channel
----------------------------------------
[https://jira.jboss.org/browse/JGRP-1245]

Allows for programmatic creation of a JChannel, no need for XML config file.

Blog: http://belaban.blogspot.com/2010/10/programmatic-creation-of-channel.html

S3: new features
----------------
[https://jira.jboss.org/browse/JGRP-1234] Allow use of public buckets (no credentials need to be sent)
[https://jira.jboss.org/browse/JGRP-1235] Pre-signed URLs

STOMP: new protocol to allows STOMP clients to talk to a JGroups node
---------------------------------------------------------------------
[https://jira.jboss.org/browse/JGRP-1248]

Blog: http://belaban.blogspot.com/2010/10/stomp-for-jgroups.html

Optimizations
=============

NAKACK: simplify and optimize handling of OOB messages
------------------------------------------------------
[https://jira.jboss.org/browse/JGRP-1104]

Discovery: reduce number of discovery responses sent in a large cluster
-----------------------------------------------------------------------
[https://jira.jboss.org/browse/JGRP-1181]

A new propery (max_rank) determines who will and who won't send discovery responses.

New timer implementations
-------------------------
[https://jira.jboss.org/browse/JGRP-1051]

Way more effecient implementations of the timer (TimeScheduler).

Bug fixes
=========

ENCRYPT: encrypt entire message when length=0
---------------------------------------------
[https://jira.jboss.org/browse/JGRP-1242]

ENCRYPT would not encrypt messages whose length = 0

FD_ALL: reduce number of messages sent on suspicion
---------------------------------------------------
[https://jira.jboss.org/browse/JGRP-1241]

FILE_PING: empty files stop discovery
-------------------------------------
[https://jira.jboss.org/browse/JGRP-1246]

Manual
======

The manual is online at http://www.jgroups.org/manual/html/index.html

The complete list of features and bug fixes can be found at http://jira.jboss.com/jira/browse/JGRP.

Bela Ban, Kreuzlingen, Switzerland
Vladimir Blagojevic, Toronto, Canada
Richard Achmatowicz, Toronto, Canada

Nov 2010

Wednesday, October 27, 2010

STOMP for JGroups

FYI,

I've written a new JGroups protocol STOMP, which implements the STOMP protocol. This allows for STOMP clients to connect to any JGroups server node (which has the JGroups STOMP protocol in its configuration).

The benefits of this are:

Clients can be written in any language. For example, I've used stomppy, a Python client, to connect to JGroups server nodes, and successfully subscribed to destinations, and sent and received messages.
Sometimes, clients don't want to be peers, ie. they don't want to join a cluster and become full members. These (light-weight) clients could also be in a different geographic location, and not be able to use IP multicasting.
Clients are started and stopped frequently, and there might be many of them. Frequently starting and stopping a full-blown JGroups server node has a cost, and is not recommended. Besides, a high churn rate might move the cluster coordinator around quite a lot, preventing it from doing real work.
We can easily scale to a large number of clients. Although every client requires 1 thread on the server side, we can easily support hundreds of clients. Note though that I wouldn't use the current JGroups STOMP protocol to connect thousands of clients...

Let's take a quick look: I started an instance of JGroups with STOMP on the top of the protocol stack (on 192.168.1.5). Then I connected to it with the JGroups client:

JGroups STOMP client

As can be seen, the first response the client received was an INFO with information about the available endpoints (STOMP instances) in the cluster. This is actually used by the StompConnection client to failover to a different server node should the currently connected to server fail.
Next, we subscribe to destination /a using the simplified syntax of the JGroups STOMP client.

Then, a telnet session to 192.168.1.5:8787 was started:

Telnet STOMP client

We get the INFO response with the list of endpoints too here. Then we subscribe to the /a destination. Note that the syntax used here is compliant with the STOMP protocol spec: first is the verb (SUBSCRIBE), then an optional bunch of headers (here just one, defining the destination to subscribe to), a newline and finally the body, terminated with a 0 byte. (SUBSCRIBE does not have a body).

Next, we send a message to all clients subscribed to /a. This is the telnet session itself, as evidenced by the reception of MESSAGE. If you look at the JGroups STOMP client, the message is also received there.

Next the JGroups client also sends a message to destination /a, which is received by itself and the telnet client.

JGroups 2.11.0.Beta2 also ships with a 'stompified' Draw demo, org.jgroups.demos.StompDraw, which is a stripped down version of Draw, using the STOMP protocol to send updates to the cluster.

Let me know what you think of this; feature requests, feedback etc appreciated (preferably on one of the JGroups mailing lists) !

The new protocol is part of JGroups 2.11.0.Beta2, which can be downloaded here.

Documentation is here.

Enjoy !

Wednesday, October 20, 2010

Programmatic creation of a channel

I've committed code which provides programmatic creation of channels. This is a way of creating a channel without XML config files. So instead of writing

JChannel ch=new JChannel("udp.xml");

, I can construct the channel programmatically:

JChannel ch=new JChannel(false);                 // 1
ProtocolStack stack=new ProtocolStack(); // 2
ch.setProtocolStack(stack);              // 3
stack.addProtocol(new UDP().setValue("ip_ttl", 8));
     .addProtocol(new PING())
     .addProtocol(new MERGE2())
     .addProtocol(new FD_SOCK())
     .addProtocol(new FD_ALL().setValue("timeout", 12000));
     .addProtocol(new VERIFY_SUSPECT())
     .addProtocol(new BARRIER())
     .addProtocol(new NAKACK())
     .addProtocol(new UNICAST2())
     .addProtocol(new STABLE())
     .addProtocol(new GMS())
     .addProtocol(new UFC())
     .addProtocol(new MFC())
     .addProtocol(new FRAG2());       // 4
stack.init();                         // 5

First, a JChannel is created (1). The 'false' argument means that the channel must not create its own protocol stack, because we create it (2) and stick it into the channel (3).

Next, all protocols are created and added to the stack (4). This needs to happen in the order in which we want the protocols to be, so the first protocol added is the transport protocol (UDP in the example).

Note that we can use Protocol.setValue(String attr_name, Object attr_value) to configure each protocol instance. We can also use regular setters if available.

Finally, we call init() (5), which connects the protocol list correctly and calls init() on every instance. This also handles shared transports correctly. For an example of how to create a shared transport with 2 channels on top see ProgrammaticApiTest.

I see mainly 3 use cases where programmatic creation of a channel is preferred over declarative creation:

Someone hates XML (I'm not one of them) :-)
Unit tests
Projects consuming JGroups might have their own configuration mechanism (e.g. GUI, properties file, different XML configuration etc) and don't want to use the XML cofiguration mechanism shipped with JGroups.

Let me know what you think about this API ! I deliberately kept it simple and stupid, and maybe there are things people like to see changed. I'm open to suggestions !

Cheers,

Friday, October 01, 2010

Confessions of a serial protocol designer

I have a confession to make.

I'm utterly disgusted by my implementation of FD_ALL, and thanks to David Forget for pointing this out !

What's bad about FD_ALL ? It will not scale at all ! After having written several dozen protocols, I thought an amateurish mistake like the one I'm about to show would certainly not happen to me anymore. Boy, was I wrong !

FD_ALL is about detecting crashed nodes in a cluster, and the protocol then lets GMS know so that the crashed node(s) can be excluded from the view.

Let's take a look at the design.

Every node periodically multicasts a HEARTBEAT
This message is received by everyone in the cluster and a hashmap of nodes and timestamps is updated; for a node P, P's timestamp is set to the current time
Another task run at every node periodcially iterates through the timestamps and checks if any timestamps haven't been updated for a given time. If that's the case, the members with outdated timestamps are suspected
A suspicion of P results in a SUSPECT(P) multicast
On reception of SUSPECT(P), every node generates a SUSPECT(P) event and passes it up the stack
VERIFY_SUSPECT catches SUSPECT(P) and sends an ARE_YOU_DEAD message to P
If P is still alive, it'll respond with a I_AM_NOT_DEAD message
If the sender doesn't get this message for a certain time, it'll pass the SUSPECT(P) event further up the stack (otherwise it'll drop it), and GMS will exclude P from the view, but if and only if that given node is the coordinator (first in the view)

Can anyone see the flaw in this design ? Hint: it has to do with the number of messages generated...

OK, so let's see what happens if we have a cluster of 100 nodes:

Say node P is temporarily slow; it doesn't send HEARTBEATs because a big garbage collection is going on, or the CPU is crunching at 90%
99 nodes multicast a SUSPECT(P) message
Every node Q therefore receives 99 SUSPECT(P) messages

Q (via VERIFY_SUSPECT) sends a ARE_YOU_DEAD message to P
P (if it can) responds with an I_AM_NOT_DEAD back to Q
So the total number of messages generated by a single node is 99 * 2

This is done on every node, so the total number of messages is 99 * 99 * 2 = 19'602 messages !

Can you imagine what happens to P, which is a bit overloaded and cannot send out HEARTBEATs in time when it receives 19'602 messages ?

It it aint dead yet, it will die !

Isn't it ironic: by asking a node if it is still alive, we actually kill it !

This is an example of where the effects of using IP multicasts were not taken into account: if we multicast M, and everybody who receives M sends 2 messages, I neglected to see that the number of messages sent is a function of the cluster size !

So what's the solution ? Simple, elegant and outlined in [1].

Everybody sends a HEARTBEAT multicast periodically
Every member maintains a suspect list
This list is adjusted on view changes
Reception of a SUSPECT(P) message adds P to the list
When we suspect P because we haven't received a HEARTBEAT (or traffic if enabled):

The set of eligible members is computed as: members - suspected members
If we are the coordinator (first in the list):

Pass a SUSPECT(P) event up the stack, this runs the VERIFY_SUSPECT protocol and eventually passes the SUSPECT(P) up to GMS, which will exclude P from the view

The cost of running the suspicion protocol is (excluding the periodic heartbeat multicasts):

1 ARE_YOU_DEAD unicast to P
A potential response (I_AM_NOT_DEAD) from P to the coordinator

TOTAL COST in a cluster of 100: 2 messages (this is always constant), compared to 19'602 messages before !

This is way better than the previous implementation !

[1] https://jira.jboss.org/browse/JGRP-1241

Belas Blog