Belas Blog: 2012

Friday, November 16, 2012

Persisting discovery responses with TCPPING

I've added a nifty little feature to JGroups which helps people who use TCPPING but can't list all of the cluster nodes in the static list.

So far I've always said that if someone needs dynamic discovery, they should use a dynamic discovery protocol such as PING / MPING (require IP multicasting), TCPGOSSIP (requires external GossipRouter process), FILE_PING (requires shared file system), S3_PING / AS_PING / SWIFT_PING / RACKSPACE_PING (requires to be running in a cloud) or JDBC_PING (requires a database).

I always said that TCPPING is for static clusters, ie. clusters where the membership is finite and is always known beforehand.

However, there are cases, where it makes sense to add a little dynamicity to TCPPING, and this is what PDC (Persistent Discovery Cache) does.

PDC is a new protocol that should be placed somewhere between the transport and the discovery protocol, e.g.

    <TCP />

    <PDC cache_dir="/tmp/jgroups"  />

    <TCPPING timeout="2000" num_initial_members="20"
            initial_hosts="192.168.1.5[7000]"
            port_range="0" return_entire_cache="true"
            use_disk_cache="true" />

Here, PDC is placed above TCP and below TCPPING. Note that we need to set use_disk_cache to true in the discovery protocol for it to use the persistent cache.

What PDC does is actually very simple: it intercepts discovery responses and persists them to disk. Whenever a discovery request is received, it also intercepts that request and adds its own results from disk to the response set.

Let's take a look at a use case (with TCPPING) that PDC solves:

The membership is {A,B,C}
TCPPING.initial_hosts="A"
A is started, the cluster is A|0={A}
B is started, the cluster is A|1={A,B}
C is started, the cluster is A|2={A,B,C}
A is killed, the cluster is B|3={B,C}
C leaves, the cluster is B|4={B}
C joins again

Without PDC, it doesn't get a response from A (which is the only node listed in TCPPING.initial_hosts), and forms a singleton cluster C|0={C}
With PDC, C discovers A and B and asks both of them for an initial discovery. B replies and therefore the new view is B|5={B,C}

The directory in which PDC stores its information is configured with PDC.cache_dir. If multiple cluster nodes are running on the same physical box, they can share that directory.

Feedback appreciated on the mailing list !
Cheers,
Bela

Friday, October 19, 2012

JGroups 3.2.0.Final released

I've released JGroups 3.2.0.Final, the most important features are:

RELAY2

Used for cross-site replication in Infinispan.
Compared to RELAY, RELAY2 allows to be connected to more than 1 site. In 3.3, we're planning to add hierarchical routing, ie. to allow for tree-like site composition.
Multicasting across sites is also supported.
[http://www.jgroups.org/manual-3.x/html/user-advanced.html#Relay2Advanced]

Internationalized logging

The most important user-facing warnings and error messages (e.g. configuration errors) have been internationalized.
Error/warning translations are in jg-messages.properties. If someone wants to translate these into a different language, e.g. French, just copy jg-messages.properties into jg-messages_fr.properties and translate the messages. The new file now only needs to be added to the classpath, no changes to JGroups !

Reduction of error/warn messages

Sometimes there are a lot of recurring warnings or error messages, e.g. warnings about messages received from different clusters, or warnings about messages from members with different JGroups versions.
These can now be suppressed for a certain time, e.g. we can configure that there's only *one* warning every 60 seconds about messages from different clusters.
[https://issues.jboss.org/browse/JGRP-1518]

A full list of features and bug fixes is here.

The manual can be found at http://www.jgroups.org/manual-3.x/html/index.html.

Questions and feedback as usual on the mailing lists.

Enjoy !

Bela Ban
Kreuzlingen, Oct 2012

Friday, July 06, 2012

JGroups 3.0.11 and 3.1.0 released

I'm happy to announce that I've released JGroups versions 3.0.11 and 3.1.0 !

3.0.11 is the 3.0.x branch which is used by the newly released EAP 6 / JBoss 7.x application server. It consists mainly of bug fixes (and one or two performance enhancements) backported from the 3.1 branch.

The 3.1.0 release has 90+ issues which were resolved (some of them backported to 3.0.x).

Here's a short list of the major issues resolved in 3.1.0, for details consult [2]:

NAKACK, UNICAST and NAKACK2 now use a new internal data structure for message delivery and retransmission, which reduces the memory needed by JGroups
MERGE3: a new merge protocol for large clusters
RSVP: blocks the sender until a given message has been received by all members of the target set
A new Total Order Anycast (TOA) protocol needed by the next version of Infinispan to deliver messages to a cluster subset in total order
New discovery protocols for mod-cluster (not yet completely done), Rackspace and OpenStack
MPerf / UPerf: dynamic multicast and unicast performance tests
Concurrent joins to a non-existing cluster are faster, and there's less chances of a merge happening (optimization)
TCP: socket creation doesn't block sending of regular messages (optimization)

Both JGroups 3.0.11 and 3.1.0 can be downloaded from [1]. The updated documentation can be found at [3].

As usual, use the mailing lists or fora for questions.

Enjoy !

[1] https://sourceforge.net/projects/javagroups/files/JGroups/

[2] https://github.com/belaban/JGroups/blob/master/doc/ReleaseNotes-3.1.0.txt

[3] http://www.jgroups.org/manual-3.x/html/index.html

Sunday, April 01, 2012

JBoss World 2012

I'm going to be speaking at JBossWorld 2012 (June 29th) on session clustering in EAP 6 (JBoss 7.1.x):
http://www.redhat.com/summit/sessions/best-of.html#18

The talk is a remake of the 2008 talk held by Brian Stansberry and me, and will show how clustering performance has increased between JBoss 4 and 7. However, this is not all, I'll cover among other things:

Configuration of an EAP 6 cluster
Use of EAP 6 domains to start and stop JBoss instances in a cluster, to deploy applications across the entire cluster, and to disseminate configuration changes
Pros and cons of replication and distribution, and its effect on scalability and performance
Configuration and tuning of Infinispan and JGroups to achieve optimal performance
Setup of mod-cluster to dynamically add and remove JBoss instances and applications
Performance difference between EAP 5 and 6

I'll be in Boston Tuesday until Friday and hope to meet many users of JGroups/Infinispan/JBoss clustering, get feedback and experience reports on the good, bad and ugly, and in general have many good discussions !

Friday, February 10, 2012

JGroups 3.1.0.Alpha2 released

I'm happy to announce the release of JGroups 3.1.0.Alpha2 !

Don't be put off by the Alpha2 suffix; as a matter of fact, this release is very stable, and I might just go ahead and promote it to "Final" within a short time !

At the time of writing this, I still have a few issues open in 3.1, but because I think the current feature set is great, I might push them into a 3.2.

So what features and enhancements did 3.1 add ? In a nutshell:

A new protocol NAKACK2: this is a successor to NAKACK (which will get phased out over the next couple of releases). The 2 biggest changes are:

A new memory efficient data structure (Table) is used to store messages to be retransmitted. It can grow and shrink dynamically, and replaces NakReceiverWindow.
There is no Retransmitter associated with each table, and we don't create an entry *per* missing sequence number (seqno) or seqno range. Instead, we have *one* single retransmission task, which periodically (xmit_interval ms) scans through *all* tables, identifies gaps and triggers retransmission for missing messages. This is a significant code simplification and brings memory consumption down when we have missing messages.

Changes to UNICAST2 and UNICAST: in both cases, we switch from NakReceiverWindow / AckSenderWindow / AckReceiverWindow to Table and instead of a retransmitter per member, we now have *one* retransmitter task for *all* members.
The changes in NAKACK2, UNICAST2 and UNICAST have several benefits:

Code simplification: having only one data structure (Table) instead of several ones (NakReceiverWindow, AckSenderWindow, AckReceiverWindow), plus removing all Retransmitter implementations leads to simpler code.
Code reduction: several classes can be removed, making the code base simpler to understand, and reducing complexity
Better maintainability: Table is now an important core data structure, and improvements to it will affect many parts of JGroups
Smaller memory footprint: especially for larger clusters, having less per-member data (e.g. retransmission tasks) should lead to better scalability in large clusters (e.g. 1000 nodes).
Smooth transition: we'll leave NAKACK (and therefore NakReceiverWindow and Retransmitter) in JGroups for some releases. NAKACK / NakReceiverWindow have served JGroups well for over a decade, and are battle-tested. When there is an issue with NAKACK2 / Table in production, we can always fall back to NAKACK. I intend to phase out NAKACK after some releases and a good amount of time spent in production around the world, to be sure NAKACK2 works well

MERGE3: merging is frequent in large clusters. MERGE3 handles merging in large clusters better by

preventing (or reducing the chances of) concurrent merges
reducing traffic caused by merging
disseminating {UUID/physical address/logical name} information, so every node has this information, reducing the number of times we need to ask for it explicitly.
MERGE3 was written with UDP as transport in mind (which is the transport recommended for large clusters anyway), but it also works with TCP.

Synchronous messages: they block the sender until the receiver or receivers have ack'ed its delivery. This allows for 'partial flushing' in the sense that all messages sent by a member P prior to M will get delivered at all receivers before delivering M.
This is related to FLUSH, but less costly and can be done per message. For example, if a unit of work is done, a sender could send an RSVP tagged message M and would be sure that - after the send() returns - all receivers have delivered M.
To send an RSVP marked messages, Message.setFlag(Message.Flag.RSVP) has to be used.
A new protocol (RSVP) needs to be added to the stack. See the documentation (link below) for details.
A new rackspace-based discovery protocol
Concurrent joining to a non-existing cluster is faster
Elimination (or reduction of) "no physical address for X; dropping message" warnings
Elimination of global JGroups ThreadGroup leaks
Elimination of socket leaks with TCPPING

The full list of changes is at [1], the manual can be found at [2] and 3.1 can be downloaded from [3].

Feedback is appreciated on the mailing lists, enjoy !

[1] https://github.com/belaban/JGroups/blob/master/doc/ReleaseNotes-3.1.0.txt
[2] http://www.jgroups.org/manual-3.x/html/index.html
[3] https://sourceforge.net/projects/javagroups/files/JGroups/3.1.0.Alpha2/

Belas Blog