Friday, March 05, 2010

Status report: performance of JGroups 2.10.0.Alpha2

I've already improved (mainly unicast) performance in Alpha1, a short list is:

  • BARRIER: moved lock acquired by every up-message out of the critical path
  • IPv6: just running a JGroups channel without any system props (e.g. now works, as IPv4 addresses are mapped to IP4-mapped IPv6 addresses under IPv6
  • NAKACK and UNICAST: streamlined marshalling of headers, drastically reducing the number of bytes streamed when marshalling headers
  • TCPGOSSIP: Vladimir fixed a bug in RouterStub which caused GossipRouters to return incorrect membership lists, resulting in JOIN failures
  • TP.Bundler:
    • Provided a new bundler implementation, which is faster than the default one (the new *is* actually the default in 2.10)
    • Sending of message lists (bundling): we don't ship the dest and src address for each message, but only ship them *once* for the entire list
  • AckReceiverWindow (used by UNICAST): I made this almost lock-free, so concurrent messages to the same recipient don't compete for the same lock. Should be a nice speedup for multiple unicasts to the same sender (e.g. OOB messages)
The complete list of features is at [1].

In 2.10.0.Alpha2 (that's actually the current CVS trunk), I replaced strings as header names with IDs [2]. This means that for each header, instead of marshalling "UNICAST" as a moniker for the UnicastHeader, we marshal a short.

The string (assuming a single-byte charset) uses up 9 bytes, whereas the short uses 2 bytes. We usually have 3-5 headers per message, so that's an average of 20-30 bytes saved per message. If we send 10 million messages, those saving accumulate !

Not only does this change make the marshalled message smaller, it also means that a message kept in memory has a smaller footprint: as messages are kept in memory until they're garbage collected by STABLE (or ack'ed by UNICAST), the savings are really nice...

The downside ? It's an API change for protocol implementers: methods getHeader(), putHeader() and putHeaderIfAbsent() in Message changed from taking a string to taking a short. Plus, if you implement headers, you have to register them in jg-magic-map.xml / jg-protocol-ids.xml and implement Streamable...

Now for some performance numbers. This is a quick and dirty benchmark, without many data points...

perf.Test (see [3] for details) has N senders send M messages of S size to all cluster nodes. This exercises the NAKACK code.

On my home cluster (4 blades with 4 cores each), 1GB ethernet, sending 1000-byte messages:
  • 4 senders, JGroups 2.9.0.GA:         128'000 messages / sec / member
  • 4 senders, JGroups 2.10.0.Alpha2: 137'000 messages / sec / member
  • 6 senders, JGroups 2.10.0.Alpha2: 100'000 messages / sec /member
  • 8 senders, JGroups 2.10.0.Alpha2:  78'000 messages / sec / member
2.10.0.Alpha2 is ca 7% faster for 4 members.

There is also a stress test for unicasts, UnicastTestRpcDist. It mimicks DIST mode of Infinispan and has every member invoke 20'000 requests on 2 members; 80% of those requests are GETs (simple RPCs) and 20% are PUTs (2 RPCs in parallel). All RPCs are synchronous, so the caller always waits for the result and thus blocks for the roud trip time. Every member has 25 threads invoking the RPCs concurrently.

On my home network, I got the following numbers:
  • 4 members, JGroups 2.9.0.GA:         4'500 requests / sec / member
  • 4 members, JGroups 2.10.0.Alpha2: 5'700 requests / sec / member
  • 6 members, JGroups 2.9.0.GA:         4'000 requests / sec / member
  • 6 members, JGroups 2.10.0.Alpha2: 5'000 requests / sec / member
  • 8 members, JGroups 2.9.0.GA:         3'800 requests / sec / member
  • 8 members, JGroups 2.10.0.Alpha2: 4'300 requests / sec / member

In our Atlanta lab (faster boxes), I got (unfortunately only for 2.10.0.Alpha2):

  • 4 members, JGroups 2.10.0.Alpha2: 10'900 requests / sec / member
  • 6 members, JGroups 2.10.0.Alpha2: 10'900 requests / sec / member
  • 8 members, JGroups 2.10.0.Alpha2: 10'900 requests / sec / member
Since the focus of the first half of 2.10.0 was on improving unicast performance, the numbers above are already pretty good and show (at least for up to 8 members) linear scalability.


No comments:

Post a Comment