Belas Blog: 2015

Tuesday, November 03, 2015

Talk at Berlin JUG Nov 19

For those of you living in Berlin, mark your calendars: there's an event [1] held by the JUG Berlin-Brandenburg Nov 19 on

JGroups (yours truly)
New features of Infinispan 8 (Galder Zamarreno)
Infinispan (Tristan Tarrant) and
Wildfly clustering (Paul Ferraro)

Free food and beverages will be provided, and - because we're having our clustering team meeting the same week - most clustering devs will be present to mingle with after the talks... :-)

Hope to see many of you there !

[1] http://www.jug-berlin-brandenburg.de/

Wednesday, September 09, 2015

JGroups 3.6.6.Final released

I don't like releasing a week after I released 3.6.5, but the Infinispan team found 2 critical bugs in TCP_NIO2:

Messages would get corrupted as they were sent asynchronously and yet the buffer was reused and modified while the send was in transit (JGRP-1961)
TCP_NIO2 could start dropping messages because selection key registration was not thread safe: JGRP-1963

But bugs affect TCP_NIO2 only, and no other protocols.

So, there it is: 3.6.6.Final ! :-)

Enjoy (and find more bugs in TCP_NIO2) !

Thursday, September 03, 2015

JGroups 3.6.5 released

I'm happy to announce that 3.6.5 has been released !

One more patch release (3.6.6) is planned, and then I'll start working on 4.0 which will require Java 8. I'm looking forward to finally also being able to start using functional programming ! :-) (Note that I wrote my diploma thesis in Common Lisp back in the days...)

The major feature of 3.6.5 is certainly support for non-blocking TCP, based on NIO.2. While I don't usually add features to a patch release, I didn't want to create a 3.7.0, and I wanted users to be able to still use Java 7, and not require 8 in order to use the NIO stuff.

Here's a summary of the more important changes in 3.6.5:

TCP_NIO2: new non-blocking transport based on NIO.2

[https://issues.jboss.org/browse/JGRP-886]

This new transport is based on NIO.2 and non-blocking, ie. no reads or writes will ever block. The biggest advantage compared to TCP is that we moved from the 1-thread-per-connection model to the 1-selector-for-all-connections model.
This means that we use 1 thread for N connections in TCP_NIO2, while TCP used N threads.
To use this, new classes TcpClient / NioClient and TcpServer / NioServer have been created.
More details at http://belaban.blogspot.ch/2015/07/a-new-nio.html.

Fork channels now support state transfer

[https://issues.jboss.org/browse/JGRP-1941]

Fork channels used to throw an exception on calling ForkChannel.getState(). This is now supported; details in the JIRA issue.

GossipRouter has been reimplemented using NIO

[https://issues.jboss.org/browse/JGRP-1943]

GossipRouter can now use a blocking (TcpServer) or a non-blocking (NioServer) implementation. On the client side, RouterStub (TUNNEL and TCPGOSSIP) can do the same, using TcpClient or NioClient.
Which implementation is used is governed by the -nio flag when starting the router, or in the configuration of TUNNEL / TCPGOSSIP (use_nio).
Blocking clients can interact with a non-blocking GossipRouter, and vice versa.

Retransmissions use the INTERNAL flag

[https://issues.jboss.org/browse/JGRP-1940]

Retransmissions use the internal flag: when a retransmission is a request, a potential response was also flagged as internal. This flag is now cleared on reception of a request.

Lock.tryLock() can wait forever

[https://issues.jboss.org/browse/JGRP-1949]

Caused by a conversion from nanos to millis.

TCPPING: access initial_hosts in the defined order

[https://issues.jboss.org/browse/JGRP-1959]

Was not the case as we used a HashSet which reordered elements.

SWIFT_PING: support JSON

[https://issues.jboss.org/browse/JGRP-1954]

Request/response format has changed from application/xml to application/json in the Identity API.

The manual is at http://www.jgroups.org/manual/index.html.

The complete list of features and bug fixes can be found at http://jira.jboss.com/jira/browse/JGRP.

Enjoy !

Bela Ban, Kreuzlingen, Switzerland, Sept 2015

Monday, July 27, 2015

A new NIO.2 based transport

I'm happy to announce a new transport based on NIO.2: TCP_NIO 2 !

The new transport is completely non-blocking, so - contrary to TCP - never blocks on a socket connect, read or write.

The big advantage of TCP_NIO2 over TCP is that it doesn't need to create one reader thread per connection (and possibly a writer thread as well, if send queues are enabled).

With a cluster of 1000 nodes, in TCP every node would have 999 reader threads and 999 connections. While we still have 999 TCP connections open (max), in TCP_NIO2 we only have a single selector thread servicing all connections. When data is available to be read, we read as much data as we can without blocking, and then pass the read message(s) off to the regular or OOB thread pools for processing.

This makes TCP_NIO2 a more scalable and non-blocking alternative to TCP.

Performance

I ran the UPerf and MPerf tests [3] on a 9 node cluster (8-core boxes with ~5300 bogomips and 1 GB networking) and got the following results:

UPerf (500'000 requests/node, 50 invoker threads/node):
TCP: 62'858 reqs/sec/node, TCP_NIO2: 65'387 reqs/sec/node

MPerf (1 million messages/node, 50 sender threads/node):
TCP: 69'799 msgs/sec/node, TCP_NIO2: 77'126 msgs/sec/node

So TCP_NIO2 was better in both cases, which surprised me a bit as there have been reports claiming that the BIO approach was faster.

I therefore recommend run the tests in your own environment, with your own application, to get numbers that are meaningful in your system.

The documentation is here: [1].
Cheers,

[1] http://www.jgroups.org/manual/index.html#TCP_NIO2

[2] https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/TCP_NIO2.java

[3] http://www.jgroups.org/manual/index.html#PerformanceTests

Friday, May 15, 2015

Release of jgroups-raft 0.2

I'm happy to announce the first usable release of jgroups-raft [1] !

Compared to 0.1, which was a mere prototype, 0.2 has a lot more features and is a lot more robust. Besides fixing quite a few bugs and adding unit tests to prevent future regressions, I

switched to Java 8
implemented dynamic addition and removal of servers
wrote the manual, and
wrote a consensus based replicated counter

The full list is at [2]. For questions, feedback etc use the mailing list [3].
Cheers,

[1] http://belaban.github.io/jgroups-raft

[2] https://github.com/belaban/jgroups-raft/issues?q=milestone%3A0.2+is%3Aclosed

[3] https://groups.google.com/forum/#!forum/jgroups-raft

Wednesday, April 29, 2015

JGroups workshops in New York and Mountain View

I'm happy to announce that we're offering 2 JGroups trainings in the US: in New York and Mountain View in Sept 2015 !

The workshop will be interactive and is for medium to advanced developers. I'm teaching both workshops, so I should be able to answer all JGroups related questions ... :-)

An overview of what we'll be doing over the 4.5 days is here:
https://github.com/belaban/workshop/blob/master/slides/toc.adoc.

To get more info and to register visit http://www.jgroups.org/workshops.html.

Registration is now open. The class size is limited to 20 each.

Hope to see someof you at a workshop this year !

Tuesday, March 17, 2015

Everything you always wanted to know about JGroups (but were afraid to ask): JGroups workshop in Berlin

I'm happy to announce a JGroups workshop in Berlin June 1-5 2015 !

This is your chance to learn everything you always wanted to know about JGroups... and more :-)

This is the second in a series of 4 workshops I'll teach this year; 2 in Europe and 2 in the US (NYC and Mountain View, more on the US workshops to be announced here soon).

Rome is unfortunately already sold out, but Berlin's a nice place, too...

The workshop is 5 days and attendees will learn the following [1]:

Monday: API [introductory]
Tuesday: Building blocks (RPCs, distributed locks, counters etc) [medium]
Wednesday/Thursday: advanced topics and protocols [advanced]
Friday: admin stuff [medium]

I've written some nice labs and I'm trying to make this as interactive and hands-on as possible. Be aware though that the workshop (especially the middle part) is not for the faint of heart and complete JGroups newbies are not going to benefit as much as people who've already used JGroups...

The price is 1'500 EUR (early bird: 1'000 EUR). This gets you a week of total immersion into JGroups and beers in the evening with me (not sure this is a good thing though :-))...

Registration [2] is now open (15 tickets only because I want to have a max of 20 attendees - 5 already registered). There's an early bird registration rate (500 EUR off) valid until April 10. Use code JGRP2015 to get the early bird.

The recommended hotel is nhow Berlin [3]. Workshop attendees will get a special rate; check here again in a few days (end of March the latest) on how to book a room at a discounted rate.

Hope to see some of you in Berlin in June !
Cheers,

[1] https://github.com/belaban/workshop/blob/master/slides/toc.adoc

[2] http://www.amiando.com/JGroupsWorkshopBerlin

[3] http://www.nh-hotels.de/hotel/nhow-berlin

Thursday, January 15, 2015

JGroups workshop

I'm happy to announce that I'm putting the finishing touches to a JGroups workshop [1].

It consists of 4 modules with labs:

Using JGroups: API (beginner level, 1 day)
Using JGroups: building blocks (beginner level, 1 day)
Advanced (medium to advanced level, 2 days)
Admin (medium level, 1 day)

The modules can be mixed and matched, but I think that a public workshop will present them in this order. Beginners may wish to attend only the first 2 days, while others may want to skip the first 2 days and only attend the Advanced and Admin parts.

We're also thinking about offering a consulting package which includes selected modules and a few consulting days. Also, a combined JDG and JGroups workshop is being discussed. But this is all up for discussion at our Berlin meeting this February.

The first workshop will probably be a Red Hat internal one somewhere in EMEA.

As for public workshops, I'm shooting for 2 in Europe and 2 in the US (East and West coast) this year.

If you have suggestions regarding locations and dates, please send me an email (belaban at yahoo dot com).

Registration is not yet open, but if you want to pre-register, send me an email and you'll get a notification when it opens. I promise that you won't get any marketing emails, and I'll delete that list after sending that one email... :-)
Cheers,

[1] https://github.com/belaban/workshop/blob/master/slides/toc.adoc

Tuesday, January 13, 2015

RAFT consensus in JGroups

I'm happy to announce the first alpha release of jgroups-raft, which is an implementation of RAFT [2,3] in JGroups. The jgroups-raft project is currently a separate project on GitHub [1], but may be integrated into JGroups at a later stage.

The functionality includes leader election (section 5.2 in [3]), log replication (5.3), snapshotting and log compaction (7). Cluster membership changes (6) has not yet been implemented; the system currently requires a static membership.

The persistent log is implemented using LevelDB (MapDB support is not complete yet). Also, leader election based on the log commit status (and length) (5.4.1) has not been implemented.

The code quality is alpha at best, and the functionality hasn't been tested with unit tests. Use at your own risk.

So what can jgroups-raft currently be used for ?

Mainly to experiment with RAFT consensus in JGroups. The system comes with a demo of a replicated state machine (replicated hashmap) which can be used to update state in a fixed-size cluster with consensus. The majority (RAFT.majority) is 2, so nore more than 3 instances should be started.
Start the 3 instances like this:

bin/demo.sh -name B -follower
bin/demo.sh -name C -follower
bin/demo.sh -name A

The -follower flag is optional, but it skips leader election for a quick startup (and issues with the missing implementation of 5.4.1).

Note that the -name flag is used as both the logical name of a member and the name of the log. So, after starting the 3 instances, the temp directory will contain logs A.log, B.log and C.log (using LevelDB).

If we kill B and start it again as B, then B.log will be used again. If we start a member D, then this is considered a new member and a log D.log will be created.

Here's the output at C after adding a new entry foo=bar and printing the log:
[1] add [2] get [3] remove [4] show all [5] dump log [6] snapshot [x] exit
first-applied=1, last-applied=3, commit-index=3, log size=55b

1
key: foo
value: bar
[1] add [2] get [3] remove [4] show all [5] dump log [6] snapshot [x] exit
first-applied=1, last-applied=4, commit-index=3, log size=70b

-- put(foo, bar) -> null
5

index (term): command
---------------------
1 (1): put(name, Bela)
2 (1): put(id, 322649)
3 (1): put(name, Bela Ban)
4 (7): put(foo, bar)

[1] add [2] get [3] remove [4] show all [5] dump log [6] snapshot [x] exit
first-applied=1, last-applied=4, commit-index=4, log size=70b

4
{foo=bar, name=Bela Ban, id=322649}
[1] add [2] get [3] remove [4] show all [5] dump log [6] snapshot [x] exit
first-applied=1, last-applied=4, commit-index=4, log size=70b

We can see that the state consists of 3 entries and the log has 4 elements (name was changed twice).

When a node is killed and restarted, the state machine is reinitialized from the log:

[mac] /Users/bela/jgroups-raft$ bin/demo.sh -name C -follower
LOG is existent, must not be initialized
777 [DEBUG] RAFT: set last_applied=4, commit_index=4, current_term=7
778 [DEBUG] RAFT: snapshot /tmp/C.snapshot not found, initializing state machine from persistent log
781 [DEBUG] RAFT: applied 3 log entries (2 - 4) to the state machine

-------------------------------------------------------------------
GMS: address=C, cluster=rsm, physical address=192.168.1.3:54886
-------------------------------------------------------------------
-- view change: [B|4] (3) [B, A, C]
[1] add [2] get [3] remove [4] show all [5] dump log [6] snapshot [x] exit
first-applied=1, last-applied=4, commit-index=4, log size=70b

4
{foo=bar, name=Bela Ban, id=322649}

We can see that the state machine was initialized from the persistent log.

If a member is down for a considerable amount of time, and then started again, it may be out of sync, and - if a snapshot was taken at the leader - the first log entry of the leader might be higher than the last commited log entry at the member. In this case, the leader will transfer its snapshot to the restarted member first, and then the usual algorithm is used to bring the restarted member up to date.

What's next ?

We're currently experimenting with an implementation of etcd [5] over jgroups-raft. Also, we're looking into how to use RAFT consensus in Infinispan [6].

I'm currently putting the finishing touches on a JGroups workshop (more on this soon), and will return to work on jgroups-raft after that. The next work items include

unit tests and code reviews
leader election comparing logs (5.4.1)
alternative ELECTION protocol using the JGroups built-in features (reduces code)
cluster membership changes
consistent reads; reads are currently dirty (section 8 has not yet been implemented)

Please use the mailing list [4] for feedback, questions and discussions.

Cheers,
Bela

[1] https://github.com/belaban/jgroups-raft
[2] http://raftconsensus.github.io/
[3] http://ramcloud.stanford.edu/raft.pdf
[4] https://groups.google.com/forum/#!forum/jgroups-raft
[5] https://github.com/redhat-italy/jgroups-etcd
[6] http://www.infinispan.org

Belas Blog