Sorry for the short release cycle between 4.0.16 and 4.0.17, but there was an important feature in 4.0.17, which I wanted to release as soon as possible: JGRP-2293.
This has better support for concurrent graceful leaves of multiple members (also coordinators), which is important in cloud environments, where pods are started and stopped dynamically by Kubernetes.
Not having this enhancement would lead to members leaving gracefully, but that would sometimes not be recognized and failure detection would have to kick in to exclude those members and installing the correct views. This would slow down things, and the installing of new views would be goverened by the timeouts in the failure detection protocols (FD_ALL, FD_SOCK). On top of this, in some edge cases, MERGE3 would have to kick in and fix views, further slowing things down.
There's a unit test [1] which tests the various aspects of concurrent leaving, e.g. all coordinators leaving sequentially, multiple members leaving concurrently etc
I recommend installing this version as soon as possible, especially if you run in the cloud. Questions / problems etc -> [2]
Cheers,
[1] https://github.com/belaban/JGroups/blob/master/tests/junit-functional/org/jgroups/tests/LeaveTest.java
[2] https://groups.google.com/forum/#!forum/jgroups-dev
Wednesday, February 13, 2019
Monday, January 21, 2019
JGroups 4.0.16
I just released 4.0.16. The most important features/bug fixes are (for a comprehensive list see [1]):
[1] https://issues.jboss.org/projects/JGRP/versions/12339241
- Better mechanism to detect whether IPv4 or IPv6 addresses need to be used. This eliminates the need to use java.net.preferIPv4Stack (unless forcing the use of IPv4 addresses)
- Due to a regression, MULTI_PING didn't work any longer. This was fixed in https://issues.jboss.org/browse/JGRP-2319
- Bug fixes in DNS_PING (used for running in Kubernetes/Openshift clouds):
- https://issues.jboss.org/browse/JGRP-2296
- https://issues.jboss.org/browse/JGRP-2300
- https://issues.jboss.org/browse/JGRP-2314
- https://issues.jboss.org/browse/JGRP-2316
- https://issues.jboss.org/browse/JGRP-2309
- ASYM_ENCRYPT
- Pooled ciphers were not correctly re-initialized: https://issues.jboss.org/browse/JGRP-2279
- Concurrent access to pooled ciphers can lead to decryption errors: https://issues.jboss.org/browse/JGRP-2315
- FILE_PING (and all subclasses, e.g. S3_PING) had an issue with merging: https://issues.jboss.org/browse/JGRP-2288
- slf4j doesn't work: https://issues.jboss.org/browse/JGRP-2307
- LazyThreadFactory leaks threads: https://issues.jboss.org/browse/JGRP-2312
- TransferQueueBundler (default): view changes lead to dropping of messages to non-members; leading to issues such as delaying the leaving of a member: https://issues.jboss.org/browse/JGRP-2324
[1] https://issues.jboss.org/projects/JGRP/versions/12339241
Wednesday, January 03, 2018
Injecting a split brain into a JGroups cluster
Thanks to a contribution by Andrea Tarocchi, we can now inject arbitrary views into a JGroups cluster, generating split brain scenarios.
This is done by a new protocol INJECT_VIEW, which can be placed anywhere in the stack, e.g. like this:
...
<MERGE3 max_interval="30000"
min_interval="10000"/>
<FD_SOCK/>
<FD_SOCK/>
<FD_ALL/>
<VERIFY_SUSPECT timeout="500" />
<INJECT_VIEW/>
<BARRIER />
...
The protocol exposes a method injectView(), which can be called by probe.sh. Let's say we have a cluster {A,B,C}: we can now inject view {A} into A, and {B,C} into B and C, as follows:
probe.sh op=INJECT_VIEW.injectView["A=A;B=B/C;C=B/C"]
(Note that '/' is used to separate node names; this is needed, because ',' is used to seperate arguments by default in probe) .
The result is that node A is now its own singleton cluster (not seeing B and C), and B and C only see each other (but not A).
Note that the member names are the logical names given to individual nodes.
Once the split has been injected, MERGE3 should remedy it by merging A, B and C back into a single cluster, after between 10-30 seconds (in the above configuration).
Thanks to Andrea for creating this nifty tool to experiment with cluster splits!
This is done by a new protocol INJECT_VIEW, which can be placed anywhere in the stack, e.g. like this:
...
<MERGE3 max_interval="30000"
min_interval="10000"/>
<FD_SOCK/>
<FD_SOCK/>
<FD_ALL/>
<VERIFY_SUSPECT timeout="500" />
<INJECT_VIEW/>
<BARRIER />
...
The protocol exposes a method injectView(), which can be called by probe.sh. Let's say we have a cluster {A,B,C}: we can now inject view {A} into A, and {B,C} into B and C, as follows:
probe.sh op=INJECT_VIEW.injectView["A=A;B=B/C;C=B/C"]
(Note that '/' is used to separate node names; this is needed, because ',' is used to seperate arguments by default in probe) .
The result is that node A is now its own singleton cluster (not seeing B and C), and B and C only see each other (but not A).
Note that the member names are the logical names given to individual nodes.
Once the split has been injected, MERGE3 should remedy it by merging A, B and C back into a single cluster, after between 10-30 seconds (in the above configuration).
Thanks to Andrea for creating this nifty tool to experiment with cluster splits!
Monday, August 28, 2017
JGroups workshops in Rome and Berlin in November
I'm happy to announce that I've revamped the JGroups workshop and I'll be teaching 2 workshops in November: Rome Nov 7-10 and Berlin Nov 21-24.
The new workshop is 4 days (TUE-FRI). I've updated the workshop to use the latest version of JGroups (4.0.x), removed a few sections and added sections on troubleshooting/diagnosis and handling of network partitions (split brain).
Should be a fun and hands-on workshop!
To register and for the table of contents, navigate to [1].
To get the early-bird discount, use code EARLYBIRD.
Cheers,
[1] http://www.jgroups.org/workshops.html
The new workshop is 4 days (TUE-FRI). I've updated the workshop to use the latest version of JGroups (4.0.x), removed a few sections and added sections on troubleshooting/diagnosis and handling of network partitions (split brain).
Should be a fun and hands-on workshop!
To register and for the table of contents, navigate to [1].
To get the early-bird discount, use code EARLYBIRD.
Cheers,
[1] http://www.jgroups.org/workshops.html
Monday, June 26, 2017
Non-blocking JGroups
I'm happy to announce that I just released JGroups 4.0.4.Final!
It's prominent feature is non-blocking flow control [1,2].
Flow control makes sure that a fast sender cannot overwhelm a slow receiver in the long run (short spikes are tolerated) by adjusting the send rate to the receive rate. This is done by giving each sender credits (number of bytes) that it is allowed to send until it has to block.
A receiver sends new credits to the sender once it has processed a message.
When no credits are available anymore, the sender blocks until it gets new credits from the receiver(s). This is done in UFC (unicast flow control) and MFC (multicast flow control).
Flow control and TCP have been the only protocols that could block (UDP and TCP_NIO2 are non-blocking).
Non-blocking flow control now adds protocols UFC_NB and MFC_NB. These can replace their blocking counterparts.
Instead of blocking sender threads, the non-blocking flow control protocols queue messages when not enough credits are available to send them, allowing the sender threads to return immediately.
When fresh credits are received, the queued messages will be sent.
The queues are bound to prevent heap exhaustion; setting attribute max_queue_size (bytes) will queue messages up to max_queue_size bytes and then block subsequent attempts to queue until more space is available. Of course, setting max_queue_size to a large value will effectively make queues unbounded.
Using MFC_NB / UFC_NB with a transport of UDP or TCP_NIO2, which also never block, provides a completely non-blocking stack, where sends never block a JChannel.send(Message). If RpcDispatcher is used, there are non-blocking methods to invoke (even synchronous) RPCs, ie. the ones which return a CompletableFuture.
Other features of 4.0.4 include a new message bundler and a few bug fixes (e.g. the internal thread pool was not shut down). For the list of all JIRAs see [3].
Cheers,
[1] https://issues.jboss.org/browse/JGRP-2172
[2] http://www.jgroups.org/manual4/index.html#NonBlockingFlowControl
[3] https://issues.jboss.org/projects/JGRP/versions/12334674
It's prominent feature is non-blocking flow control [1,2].
Flow control makes sure that a fast sender cannot overwhelm a slow receiver in the long run (short spikes are tolerated) by adjusting the send rate to the receive rate. This is done by giving each sender credits (number of bytes) that it is allowed to send until it has to block.
A receiver sends new credits to the sender once it has processed a message.
When no credits are available anymore, the sender blocks until it gets new credits from the receiver(s). This is done in UFC (unicast flow control) and MFC (multicast flow control).
Flow control and TCP have been the only protocols that could block (UDP and TCP_NIO2 are non-blocking).
Non-blocking flow control now adds protocols UFC_NB and MFC_NB. These can replace their blocking counterparts.
Instead of blocking sender threads, the non-blocking flow control protocols queue messages when not enough credits are available to send them, allowing the sender threads to return immediately.
When fresh credits are received, the queued messages will be sent.
The queues are bound to prevent heap exhaustion; setting attribute max_queue_size (bytes) will queue messages up to max_queue_size bytes and then block subsequent attempts to queue until more space is available. Of course, setting max_queue_size to a large value will effectively make queues unbounded.
Using MFC_NB / UFC_NB with a transport of UDP or TCP_NIO2, which also never block, provides a completely non-blocking stack, where sends never block a JChannel.send(Message). If RpcDispatcher is used, there are non-blocking methods to invoke (even synchronous) RPCs, ie. the ones which return a CompletableFuture.
Other features of 4.0.4 include a new message bundler and a few bug fixes (e.g. the internal thread pool was not shut down). For the list of all JIRAs see [3].
Cheers,
[1] https://issues.jboss.org/browse/JGRP-2172
[2] http://www.jgroups.org/manual4/index.html#NonBlockingFlowControl
[3] https://issues.jboss.org/projects/JGRP/versions/12334674
Monday, May 08, 2017
Running an Infinispan cluster with Kubernetes on Google Container Engine (GKE)
In this post, I'm going to show the steps needed to get a 10 node Infinispan cluster up and running on Google Container Engine (GKE).
The test we'll be running is IspnPerfTest and the corresponding docker image is belaban/ispn_perf_test on dockerhub.
All that's needed to run this youselves is a Google Compute Engine account, so head on over there now and create one if you want to reproduce this demo! :-)
Alternatively, the cluster could be run locally in minikube, but for this post I chose GKE instead.
Ready? Then let's get cracking...
First, let's create a 10-node cluster in GKE. The screen shot below shows the form that needs to be filled out to create a 10 node cluster in GKE. This results in 10 nodes getting created in Google Compute Engine (GCE):
As shown, we'll use 10 v16-cpu instances with 14GB of memory each.
Press "Create" and the cluster is being created:
If you logged into your Google Compute Engine console, it would show the 10 nodes that are getting created.
When the cluster has been created, click on "Connect" and execute the "gcloud" command that's shown as a result:
gcloud container clusters get-credentials ispn-cluster \ --zone us-central1-a --project ispnperftest
We can now see the 10 GCE nodes:
[belasmac] /Users/bela/kubetest$ kubectl get nodes
NAME STATUS AGE VERSION
gke-ispn-cluster-default-pool-59ed0e14-1zdb Ready 4m v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-33pk Ready 4m v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-3t95 Ready 4m v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-5sn9 Ready 4m v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-9lmz Ready 4m v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-j646 Ready 4m v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-k797 Ready 4m v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-q80q Ready 4m v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-r96s Ready 4m v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-zhdj Ready 4m v1.5.7
Next we'll run an interactive instance of IspnPerfTest and 3 non-interactive (no need for a TTY) instances, forming a cluster of 4. First, we start the interactive instance. Note that it might take a while until Kubernetes has downloaded image belaban/ispn_perf_test from dockerhub. When done, the following is shown:
[belasmac] /Users/bela/kubetest$ kubectl run infinispan --rm=true -it --image=belaban/ispn_perf_test kube.sh
If you don't see a command prompt, try pressing enter.
-------------------------------------------------------------------
GMS: address=infinispan-749052960-pl066-27417, cluster=default, physical address=10.40.0.4:7800
-------------------------------------------------------------------
-------------------------------------------------------------------
GMS: address=infinispan-749052960-pl066-18029, cluster=cfg, physical address=10.40.0.4:7900
-------------------------------------------------------------------
created 100,000 keys: [1-100,000]
[1] Start test [2] View [3] Cache size [4] Threads (100)
[5] Keys (100,000) [6] Time (secs) (60) [7] Value size (1.00KB) [8] Validate
[p] Populate cache [c] Clear cache [v] Versions
[r] Read percentage (1.00)
[d] Details (true) [i] Invokers (false) [l] dump local cache
[q] Quit [X] Quit all
Now we'll start 3 more instances, the definition of which is taken from a Yaml file (ispn.yaml):
## Creates a number of pods on kubernetes running IspnPerfTest
## Run with: kubectl create -f ispn.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: ispn
namespace: default
spec:
replicas: 3
template:
metadata:
labels:
run: ispn-perf-test
spec:
hostNetwork: false
containers:
- args:
- kube.sh
- -nohup
name: ispn-perf-test
image: belaban/ispn_perf_test
# imagePullPolicy: IfNotPresent
The YAML file creates 3 instances (replicas: 3):
belasmac] /Users/bela/kubetest$ kubectl create -f ispn.yaml
deployment "ispn" created
Running "kubectl get pods -o wide" shows:
[belasmac] /Users/bela/kubetest$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
infinispan-749052960-pl066 1/1 Running 0 4m 10.40.0.4 gke-ispn-cluster-default-pool-59ed0e14-5sn9
ispn-1255975377-hm785 1/1 Running 0 46s 10.40.2.4 gke-ispn-cluster-default-pool-59ed0e14-r96s
ispn-1255975377-jx70d 1/1 Running 0 46s 10.40.4.3 gke-ispn-cluster-default-pool-59ed0e14-1zdb
ispn-1255975377-xf9r8 1/1 Running 0 46s 10.40.5.3 gke-ispn-cluster-default-pool-59ed0e14-3t95
This shows 1 infinispan instance (interactive terminal) and 3 ispn instances (non-interactive).
We can now exec into one of the instances are run probe.sh to verify that a cluster of 4 has formed:
[belasmac] /Users/bela/kubetest$ kubectl exec -it ispn-1255975377-hm785 bash
bash-4.3$ probe.sh -addr localhost
-- sending probe request to /10.40.5.3:7500
-- sending probe request to /10.40.4.3:7500
-- sending probe request to /10.40.2.4:7500
-- sending probe request to /10.40.0.4:7500
#1 (287 bytes):
local_addr=ispn-1255975377-xf9r8-60537
physical_addr=10.40.5.3:7800
view=[infinispan-749052960-pl066-27417|3] (4) [infinispan-749052960-pl066-27417, ispn-1255975377-xf9r8-60537, ispn-1255975377-jx70d-16, ispn-1255975377-hm785-39319]
cluster=default
version=4.0.3-SNAPSHOT (Schiener Berg)
#2 (287 bytes):
local_addr=ispn-1255975377-hm785-39319
physical_addr=10.40.2.4:7800
view=[infinispan-749052960-pl066-27417|3] (4) [infinispan-749052960-pl066-27417, ispn-1255975377-xf9r8-60537, ispn-1255975377-jx70d-16, ispn-1255975377-hm785-39319]
cluster=default
version=4.0.3-SNAPSHOT (Schiener Berg)
#3 (284 bytes):
local_addr=ispn-1255975377-jx70d-16
physical_addr=10.40.4.3:7800
view=[infinispan-749052960-pl066-27417|3] (4) [infinispan-749052960-pl066-27417, ispn-1255975377-xf9r8-60537, ispn-1255975377-jx70d-16, ispn-1255975377-hm785-39319]
cluster=default
version=4.0.3-SNAPSHOT (Schiener Berg)
#4 (292 bytes):
local_addr=infinispan-749052960-pl066-27417
physical_addr=10.40.0.4:7800
view=[infinispan-749052960-pl066-27417|3] (4) [infinispan-749052960-pl066-27417, ispn-1255975377-xf9r8-60537, ispn-1255975377-jx70d-16, ispn-1255975377-hm785-39319]
cluster=default
version=4.0.3-SNAPSHOT (Schiener Berg)
4 responses (4 matches, 0 non matches)
The 'view' item shows the same view (list of cluster members) across all 4 nodes, so the cluster has formed successfully. Also, if we look at our interactive instance, pressing [2] shows the cluster has 4 members:
-- local: infinispan-749052960-pl066-18029
-- view: [infinispan-749052960-pl066-18029|3] (4) [infinispan-749052960-pl066-18029, ispn-1255975377-xf9r8-34843, ispn-1255975377-jx70d-20952, ispn-1255975377-hm785-48520]
This means we have view number 4 (0-3) and 4 cluster members (the number in parens).
Next, let's scale the cluster to 10 members. To do this, we'll tell Kubernetes to scale the ispn deployment to 9 instances (from 3):
[belasmac] /Users/bela/IspnPerfTest$ kubectl scale deployment ispn --replicas=9
deployment "ispn" scaled
After a few seconds, the interactive terminal shows a new view containing 10 members:
** view: [infinispan-749052960-pl066-27417|9] (10) [infinispan-749052960-pl066-27417, ispn-1255975377-xf9r8-60537, ispn-1255975377-jx70d-16, ispn-1255975377-hm785-39319, ispn-1255975377-6191p-9724, ispn-1255975377-1g2kx-5547, ispn-1255975377-333rl-13052, ispn-1255975377-57zgl-28575, ispn-1255975377-j8ckh-35528, ispn-1255975377-lgvmt-32173]
We also see a minor inconvenience when looking at the pods:
[belasmac] /Users/bela/jgroups-docker$ lubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
infinispan-749052960-pl066 1/1 Running 0 13m 10.40.0.4 gke-ispn-cluster-default-pool-59ed0e14-5sn9
ispn-1255975377-1g2kx 1/1 Running 0 1m 10.40.7.4 gke-ispn-cluster-default-pool-59ed0e14-k797
ispn-1255975377-333rl 1/1 Running 0 1m 10.40.9.3 gke-ispn-cluster-default-pool-59ed0e14-9lmz
ispn-1255975377-57zgl 1/1 Running 0 1m 10.40.1.4 gke-ispn-cluster-default-pool-59ed0e14-q80q
ispn-1255975377-6191p 1/1 Running 0 1m 10.40.0.5 gke-ispn-cluster-default-pool-59ed0e14-5sn9
ispn-1255975377-hm785 1/1 Running 0 10m 10.40.2.4 gke-ispn-cluster-default-pool-59ed0e14-r96s
ispn-1255975377-j8ckh 1/1 Running 0 1m 10.40.6.4 gke-ispn-cluster-default-pool-59ed0e14-j646
ispn-1255975377-jx70d 1/1 Running 0 10m 10.40.4.3 gke-ispn-cluster-default-pool-59ed0e14-1zdb
ispn-1255975377-lgvmt 1/1 Running 0 1m 10.40.8.3 gke-ispn-cluster-default-pool-59ed0e14-33pk
ispn-1255975377-xf9r8 1/1 Running 0 10m 10.40.5.3 gke-ispn-cluster-default-pool-59ed0e14-3t95
The infinispan pod and one of the ispn pods have been created on the same GCE node gke-ispn-cluster-default-pool-59ed0e14-5sn9. The reason is that they are different deployments, and so GKE deploys them in an unrelated manner. Had all pods been created in the same depoyment, Kubernetes would have assigned pods to nodes in a round-robin fashion.
This could be fixed by using labels, but I didn't want to complicate the demo. Note that running more that one pod per GCE node will harm performance slightly...
Now we can run the performance test by populating the cluster (grid) with key/value pairs ([p]) and then running the test ([1]). This inserts 100'000 key/value pairs into the grid and executes read operations on every node for 1 minute (for details on IspnPerfTest consult the Github URL given earlier):
Running test for 60 seconds:
1: 47,742 reqs/sec (286,351 reads 0 writes)
2: 56,854 reqs/sec (682,203 reads 0 writes)
3: 60,628 reqs/sec (1,091,264 reads 0 writes)
4: 62,922 reqs/sec (1,510,092 reads 0 writes)
5: 64,413 reqs/sec (1,932,427 reads 0 writes)
6: 65,050 reqs/sec (2,341,846 reads 0 writes)
7: 65,517 reqs/sec (2,751,828 reads 0 writes)
8: 66,172 reqs/sec (3,176,344 reads 0 writes)
9: 66,588 reqs/sec (3,595,839 reads 0 writes)
10: 67,183 reqs/sec (4,031,168 reads 0 writes)
done (in 60020 ms)
all: get 1 / 1,486.77 / 364,799.00, put: 0 / 0.00 / 0.00
======================= Results: ===========================
ispn-1255975377-1g2kx-51998: 100,707.90 reqs/sec (6,045,193 GETs, 0 PUTs), avg RTT (us) = 988.79 get, 0.00 put
ispn-1255975377-xf9r8-34843: 95,986.15 reqs/sec (5,760,705 GETs, 0 PUTs), avg RTT (us) = 1,036.78 get, 0.00 put
ispn-1255975377-jx70d-20952: 103,935.58 reqs/sec (6,239,149 GETs, 0 PUTs), avg RTT (us) = 961.14 get, 0.00 put
ispn-1255975377-j8ckh-11479: 100,869.08 reqs/sec (6,054,263 GETs, 0 PUTs), avg RTT (us) = 987.95 get, 0.00 put
ispn-1255975377-lgvmt-26968: 104,007.33 reqs/sec (6,243,144 GETs, 0 PUTs), avg RTT (us) = 960.05 get, 0.00 put
ispn-1255975377-6191p-15331: 69,004.31 reqs/sec (4,142,053 GETs, 0 PUTs), avg RTT (us) = 1,442.04 get, 0.00 put
ispn-1255975377-57zgl-58007: 92,282.75 reqs/sec (5,538,903 GETs, 0 PUTs), avg RTT (us) = 1,078.14 get, 0.00 put
ispn-1255975377-333rl-8583: 99,130.95 reqs/sec (5,949,542 GETs, 0 PUTs), avg RTT (us) = 1,004.08 get, 0.00 put
infinispan-749052960-pl066-18029: 67,166.91 reqs/sec (4,031,358 GETs, 0 PUTs), avg RTT (us) = 1,486.77 get, 0.00 put
ispn-1255975377-hm785-48520: 79,616.70 reqs/sec (4,778,196 GETs, 0 PUTs), avg RTT (us) = 1,254.87 get, 0.00 put
Throughput: 91,271 reqs/sec/node (91.27MB/sec) 912,601 reqs/sec/cluster
Roundtrip: gets min/avg/max = 0/1,092.14/371,711.00 us (50.0=938 90.0=1,869 95.0=2,419 99.0=5,711 99.9=20,319 [percentile at mean: 62.50]),
puts n/a
As suspected, instances infinispan-749052960-pl066-18029 and ispn-1255975377-6191p-15331 show lower performance than the other nodes, as they are co-located on the same GCE node.
The Kubernetes integration in JGroups is done by KUBE_PING which interacts with the Kubernetes master (API server) to fetch the IP addresses of the pods that have been started by Kubernetes.
The KUBE_PING protocol is new, so direct problems, issues, configuration questions etc to the JGroups mailing list.
The test we'll be running is IspnPerfTest and the corresponding docker image is belaban/ispn_perf_test on dockerhub.
All that's needed to run this youselves is a Google Compute Engine account, so head on over there now and create one if you want to reproduce this demo! :-)
Alternatively, the cluster could be run locally in minikube, but for this post I chose GKE instead.
Ready? Then let's get cracking...
First, let's create a 10-node cluster in GKE. The screen shot below shows the form that needs to be filled out to create a 10 node cluster in GKE. This results in 10 nodes getting created in Google Compute Engine (GCE):
As shown, we'll use 10 v16-cpu instances with 14GB of memory each.
Press "Create" and the cluster is being created:
If you logged into your Google Compute Engine console, it would show the 10 nodes that are getting created.
When the cluster has been created, click on "Connect" and execute the "gcloud" command that's shown as a result:
gcloud container clusters get-credentials ispn-cluster \ --zone us-central1-a --project ispnperftest
We can now see the 10 GCE nodes:
[belasmac] /Users/bela/kubetest$ kubectl get nodes
NAME STATUS AGE VERSION
gke-ispn-cluster-default-pool-59ed0e14-1zdb Ready 4m v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-33pk Ready 4m v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-3t95 Ready 4m v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-5sn9 Ready 4m v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-9lmz Ready 4m v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-j646 Ready 4m v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-k797 Ready 4m v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-q80q Ready 4m v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-r96s Ready 4m v1.5.7
gke-ispn-cluster-default-pool-59ed0e14-zhdj Ready 4m v1.5.7
Next we'll run an interactive instance of IspnPerfTest and 3 non-interactive (no need for a TTY) instances, forming a cluster of 4. First, we start the interactive instance. Note that it might take a while until Kubernetes has downloaded image belaban/ispn_perf_test from dockerhub. When done, the following is shown:
[belasmac] /Users/bela/kubetest$ kubectl run infinispan --rm=true -it --image=belaban/ispn_perf_test kube.sh
If you don't see a command prompt, try pressing enter.
-------------------------------------------------------------------
GMS: address=infinispan-749052960-pl066-27417, cluster=default, physical address=10.40.0.4:7800
-------------------------------------------------------------------
-------------------------------------------------------------------
GMS: address=infinispan-749052960-pl066-18029, cluster=cfg, physical address=10.40.0.4:7900
-------------------------------------------------------------------
created 100,000 keys: [1-100,000]
[1] Start test [2] View [3] Cache size [4] Threads (100)
[5] Keys (100,000) [6] Time (secs) (60) [7] Value size (1.00KB) [8] Validate
[p] Populate cache [c] Clear cache [v] Versions
[r] Read percentage (1.00)
[d] Details (true) [i] Invokers (false) [l] dump local cache
[q] Quit [X] Quit all
Now we'll start 3 more instances, the definition of which is taken from a Yaml file (ispn.yaml):
## Creates a number of pods on kubernetes running IspnPerfTest
## Run with: kubectl create -f ispn.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: ispn
namespace: default
spec:
replicas: 3
template:
metadata:
labels:
run: ispn-perf-test
spec:
hostNetwork: false
containers:
- args:
- kube.sh
- -nohup
name: ispn-perf-test
image: belaban/ispn_perf_test
# imagePullPolicy: IfNotPresent
The YAML file creates 3 instances (replicas: 3):
belasmac] /Users/bela/kubetest$ kubectl create -f ispn.yaml
deployment "ispn" created
Running "kubectl get pods -o wide" shows:
[belasmac] /Users/bela/kubetest$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
infinispan-749052960-pl066 1/1 Running 0 4m 10.40.0.4 gke-ispn-cluster-default-pool-59ed0e14-5sn9
ispn-1255975377-hm785 1/1 Running 0 46s 10.40.2.4 gke-ispn-cluster-default-pool-59ed0e14-r96s
ispn-1255975377-jx70d 1/1 Running 0 46s 10.40.4.3 gke-ispn-cluster-default-pool-59ed0e14-1zdb
ispn-1255975377-xf9r8 1/1 Running 0 46s 10.40.5.3 gke-ispn-cluster-default-pool-59ed0e14-3t95
This shows 1 infinispan instance (interactive terminal) and 3 ispn instances (non-interactive).
We can now exec into one of the instances are run probe.sh to verify that a cluster of 4 has formed:
[belasmac] /Users/bela/kubetest$ kubectl exec -it ispn-1255975377-hm785 bash
bash-4.3$ probe.sh -addr localhost
-- sending probe request to /10.40.5.3:7500
-- sending probe request to /10.40.4.3:7500
-- sending probe request to /10.40.2.4:7500
-- sending probe request to /10.40.0.4:7500
#1 (287 bytes):
local_addr=ispn-1255975377-xf9r8-60537
physical_addr=10.40.5.3:7800
view=[infinispan-749052960-pl066-27417|3] (4) [infinispan-749052960-pl066-27417, ispn-1255975377-xf9r8-60537, ispn-1255975377-jx70d-16, ispn-1255975377-hm785-39319]
cluster=default
version=4.0.3-SNAPSHOT (Schiener Berg)
#2 (287 bytes):
local_addr=ispn-1255975377-hm785-39319
physical_addr=10.40.2.4:7800
view=[infinispan-749052960-pl066-27417|3] (4) [infinispan-749052960-pl066-27417, ispn-1255975377-xf9r8-60537, ispn-1255975377-jx70d-16, ispn-1255975377-hm785-39319]
cluster=default
version=4.0.3-SNAPSHOT (Schiener Berg)
#3 (284 bytes):
local_addr=ispn-1255975377-jx70d-16
physical_addr=10.40.4.3:7800
view=[infinispan-749052960-pl066-27417|3] (4) [infinispan-749052960-pl066-27417, ispn-1255975377-xf9r8-60537, ispn-1255975377-jx70d-16, ispn-1255975377-hm785-39319]
cluster=default
version=4.0.3-SNAPSHOT (Schiener Berg)
#4 (292 bytes):
local_addr=infinispan-749052960-pl066-27417
physical_addr=10.40.0.4:7800
view=[infinispan-749052960-pl066-27417|3] (4) [infinispan-749052960-pl066-27417, ispn-1255975377-xf9r8-60537, ispn-1255975377-jx70d-16, ispn-1255975377-hm785-39319]
cluster=default
version=4.0.3-SNAPSHOT (Schiener Berg)
4 responses (4 matches, 0 non matches)
The 'view' item shows the same view (list of cluster members) across all 4 nodes, so the cluster has formed successfully. Also, if we look at our interactive instance, pressing [2] shows the cluster has 4 members:
-- local: infinispan-749052960-pl066-18029
-- view: [infinispan-749052960-pl066-18029|3] (4) [infinispan-749052960-pl066-18029, ispn-1255975377-xf9r8-34843, ispn-1255975377-jx70d-20952, ispn-1255975377-hm785-48520]
This means we have view number 4 (0-3) and 4 cluster members (the number in parens).
Next, let's scale the cluster to 10 members. To do this, we'll tell Kubernetes to scale the ispn deployment to 9 instances (from 3):
[belasmac] /Users/bela/IspnPerfTest$ kubectl scale deployment ispn --replicas=9
deployment "ispn" scaled
After a few seconds, the interactive terminal shows a new view containing 10 members:
** view: [infinispan-749052960-pl066-27417|9] (10) [infinispan-749052960-pl066-27417, ispn-1255975377-xf9r8-60537, ispn-1255975377-jx70d-16, ispn-1255975377-hm785-39319, ispn-1255975377-6191p-9724, ispn-1255975377-1g2kx-5547, ispn-1255975377-333rl-13052, ispn-1255975377-57zgl-28575, ispn-1255975377-j8ckh-35528, ispn-1255975377-lgvmt-32173]
We also see a minor inconvenience when looking at the pods:
[belasmac] /Users/bela/jgroups-docker$ lubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE
infinispan-749052960-pl066 1/1 Running 0 13m 10.40.0.4 gke-ispn-cluster-default-pool-59ed0e14-5sn9
ispn-1255975377-1g2kx 1/1 Running 0 1m 10.40.7.4 gke-ispn-cluster-default-pool-59ed0e14-k797
ispn-1255975377-333rl 1/1 Running 0 1m 10.40.9.3 gke-ispn-cluster-default-pool-59ed0e14-9lmz
ispn-1255975377-57zgl 1/1 Running 0 1m 10.40.1.4 gke-ispn-cluster-default-pool-59ed0e14-q80q
ispn-1255975377-6191p 1/1 Running 0 1m 10.40.0.5 gke-ispn-cluster-default-pool-59ed0e14-5sn9
ispn-1255975377-hm785 1/1 Running 0 10m 10.40.2.4 gke-ispn-cluster-default-pool-59ed0e14-r96s
ispn-1255975377-j8ckh 1/1 Running 0 1m 10.40.6.4 gke-ispn-cluster-default-pool-59ed0e14-j646
ispn-1255975377-jx70d 1/1 Running 0 10m 10.40.4.3 gke-ispn-cluster-default-pool-59ed0e14-1zdb
ispn-1255975377-lgvmt 1/1 Running 0 1m 10.40.8.3 gke-ispn-cluster-default-pool-59ed0e14-33pk
ispn-1255975377-xf9r8 1/1 Running 0 10m 10.40.5.3 gke-ispn-cluster-default-pool-59ed0e14-3t95
The infinispan pod and one of the ispn pods have been created on the same GCE node gke-ispn-cluster-default-pool-59ed0e14-5sn9. The reason is that they are different deployments, and so GKE deploys them in an unrelated manner. Had all pods been created in the same depoyment, Kubernetes would have assigned pods to nodes in a round-robin fashion.
This could be fixed by using labels, but I didn't want to complicate the demo. Note that running more that one pod per GCE node will harm performance slightly...
Now we can run the performance test by populating the cluster (grid) with key/value pairs ([p]) and then running the test ([1]). This inserts 100'000 key/value pairs into the grid and executes read operations on every node for 1 minute (for details on IspnPerfTest consult the Github URL given earlier):
Running test for 60 seconds:
1: 47,742 reqs/sec (286,351 reads 0 writes)
2: 56,854 reqs/sec (682,203 reads 0 writes)
3: 60,628 reqs/sec (1,091,264 reads 0 writes)
4: 62,922 reqs/sec (1,510,092 reads 0 writes)
5: 64,413 reqs/sec (1,932,427 reads 0 writes)
6: 65,050 reqs/sec (2,341,846 reads 0 writes)
7: 65,517 reqs/sec (2,751,828 reads 0 writes)
8: 66,172 reqs/sec (3,176,344 reads 0 writes)
9: 66,588 reqs/sec (3,595,839 reads 0 writes)
10: 67,183 reqs/sec (4,031,168 reads 0 writes)
done (in 60020 ms)
all: get 1 / 1,486.77 / 364,799.00, put: 0 / 0.00 / 0.00
======================= Results: ===========================
ispn-1255975377-1g2kx-51998: 100,707.90 reqs/sec (6,045,193 GETs, 0 PUTs), avg RTT (us) = 988.79 get, 0.00 put
ispn-1255975377-xf9r8-34843: 95,986.15 reqs/sec (5,760,705 GETs, 0 PUTs), avg RTT (us) = 1,036.78 get, 0.00 put
ispn-1255975377-jx70d-20952: 103,935.58 reqs/sec (6,239,149 GETs, 0 PUTs), avg RTT (us) = 961.14 get, 0.00 put
ispn-1255975377-j8ckh-11479: 100,869.08 reqs/sec (6,054,263 GETs, 0 PUTs), avg RTT (us) = 987.95 get, 0.00 put
ispn-1255975377-lgvmt-26968: 104,007.33 reqs/sec (6,243,144 GETs, 0 PUTs), avg RTT (us) = 960.05 get, 0.00 put
ispn-1255975377-6191p-15331: 69,004.31 reqs/sec (4,142,053 GETs, 0 PUTs), avg RTT (us) = 1,442.04 get, 0.00 put
ispn-1255975377-57zgl-58007: 92,282.75 reqs/sec (5,538,903 GETs, 0 PUTs), avg RTT (us) = 1,078.14 get, 0.00 put
ispn-1255975377-333rl-8583: 99,130.95 reqs/sec (5,949,542 GETs, 0 PUTs), avg RTT (us) = 1,004.08 get, 0.00 put
infinispan-749052960-pl066-18029: 67,166.91 reqs/sec (4,031,358 GETs, 0 PUTs), avg RTT (us) = 1,486.77 get, 0.00 put
ispn-1255975377-hm785-48520: 79,616.70 reqs/sec (4,778,196 GETs, 0 PUTs), avg RTT (us) = 1,254.87 get, 0.00 put
Throughput: 91,271 reqs/sec/node (91.27MB/sec) 912,601 reqs/sec/cluster
Roundtrip: gets min/avg/max = 0/1,092.14/371,711.00 us (50.0=938 90.0=1,869 95.0=2,419 99.0=5,711 99.9=20,319 [percentile at mean: 62.50]),
puts n/a
As suspected, instances infinispan-749052960-pl066-18029 and ispn-1255975377-6191p-15331 show lower performance than the other nodes, as they are co-located on the same GCE node.
The Kubernetes integration in JGroups is done by KUBE_PING which interacts with the Kubernetes master (API server) to fetch the IP addresses of the pods that have been started by Kubernetes.
The KUBE_PING protocol is new, so direct problems, issues, configuration questions etc to the JGroups mailing list.
Tuesday, February 21, 2017
JGroups 4.0.0.Final
I'm happy to announce that JGroups 4.0.0.Final is out!
With 120+ issues, the focus of this version is API changes, the switch to Java 8 and thus the use of new language features (streams, lambdas) and optimizations.
JChannel ch=new JChannel("config.xml").name("A").connect("cluster");
Use of Java 8 features
[https://issues.jboss.org/browse/JGRP-2007]
E.g. replace Condition with Predicate etc
MessageDispatcher / RpcDispatcher changes
[https://issues.jboss.org/browse/JGRP-1620]
Use of CompletableFuture instead of NotifyingFuture
TCP: remove send queues
[https://issues.jboss.org/browse/JGRP-1994]
Receiver (ReceiverAdapter) now has an additional callback receive(MessageBatch batch). This allows JGroups to pass an entire batch of messages to the application rather than passing them up one by one.
Plus fixed security issues in the refactored code. Removed ENCRYPT.
Keys 'rpcs' and 'rpcs-details' dump information about average RTTs between individual cluster members
Message bundlers can be changed at runtime via probe. This is useful to see the effect of different bundlers on performance, even in the same test run.
Exposes stats via JMX and probe
When we have multiple site masters, messages from the same member should always be handled by the same site master. This prevents reordering of messages at the receiver.
[https://issues.jboss.org/browse/JGRP-2113]
E.g.
<TCP
bind_addr="match-interface:eth2,10.5.5.5,match-interface:en*,127.0.0.1"
...
/>
This tries to bind to eth2 first, then to 10.5.5.5, then to an interface that starts with en0, and
finally to loopback.
Useful when running in an environment where the IP addresses and/or interfaces are not known before.
Multiple threads can receive (and process) messages from a datagram socket, preventing queue buildups.
Some internal classes still used Java serialization, which opens up security holes
(google 'java serialization vulnerability').
E.g. removal of SingletonAddress: not needed anymore as shared transports have been removed, too.
This massively reduces the number of Event creations.
Instead of removing messages one-by-one, the remover thread now removes as many messages as are in the queue (contention) into a local queue (no contention) and then creates and sends message batches off of the local queue.
There's only a single thread pool for all typs of messages, reducing the maintenance overhead of 4 thread pools and the configuration required.
The internal thread pool is still available (size to the number of cores), but not configurable.
A ForkJoinPool can be used instead of the thread pool (which can be disabled as well).
If a task calls execute() with an argument blocking==false, the task will be executed by the timer's main thread, and not be passed on to the timer's thread pool. This reduces the number of threads needed and therefore the number of context switches.
Ints and longs are read into a byte[] array first, then parsed. This was changed to read the values and add them to the resulting ints or longs.
Should reduce overall memory allocation as ints and longs are used a lot in headers.
This method is used a lot in NAKACK2 and UNICAST3. The change was to read messages directly into the resulting MessageBatch instead of a temp list and from there into the batch.
Reduced size from 40 -> 32 bytes.
This was done mainly for protocols where log.isTraceEnabled() was used a lot, such as TP, NAKACK2 or UNICAST3.
Note that the log level can still be changed at runtime.
This only applies only to regular (not OOB or internal) messages. Make sure that only one message per member is processed at a given time by the thread pool.
This reduces the number of threads needed.
Simpler algorithm and removal of one lock (= less contention)
Although indepent protocols, the protocol ID was assigned by the superclass, so replenish and credit messages would get mixed up, leading to stuttering in the sending of credits.
This is an edge case that was not covered before: every subgroup coordinator has some other member as coord:
A: BAC
B: CAB
C: ABC
This caused a bug in ASYM_ENCRYPT.
E.g. pcks#12 or jks
This caused an NPE.
Downloads
On Sourceforge: https://sourceforge.net/projects/javagroups/files/JGroups/4.0.0.Final,
or via Maven:
<groupId>org.jgroups</groupId>
<artifactId>jgroups</artifactId>
<version>4.0.0.Final</version>
The complete list of features and bug fixes can be found at
http://jira.jboss.com/jira/browse/JGRP.
Bela Ban, Kreuzlingen, Switzerland
March 2017
With 120+ issues, the focus of this version is API changes, the switch to Java 8 and thus the use of new language features (streams, lambdas) and optimizations.
API changes
[https://issues.jboss.org/browse/JGRP-1605]Fluent configuration
Example:JChannel ch=new JChannel("config.xml").name("A").connect("cluster");
Removed deprecated classes and methods
Removed support for plain string-based channel configuration
[https://issues.jboss.org/browse/JGRP-2019]Use of Java 8 features
[https://issues.jboss.org/browse/JGRP-2007]
E.g. replace Condition with Predicate etc
Removed classes
- Shared transport [https://issues.jboss.org/browse/JGRP-1844]
- TCP_NIO
- UNICAST, UNICAST2
- NAKACK
- PEER_LOCK
- FD_PING
- MERGE2
- FC
- SCOPE
- MuxRpcDispatcher
- ENCRYPT
MessageDispatcher / RpcDispatcher changes
[https://issues.jboss.org/browse/JGRP-1620]
Use of CompletableFuture instead of NotifyingFuture
TCP: remove send queues
[https://issues.jboss.org/browse/JGRP-1994]
New features
Deliver message batches
[https://issues.jboss.org/browse/JGRP-2003]Receiver (ReceiverAdapter) now has an additional callback receive(MessageBatch batch). This allows JGroups to pass an entire batch of messages to the application rather than passing them up one by one.
Refactored ENCRYPT into SYM_ENCRYPT and ASYM_ENCRYPT
[https://issues.jboss.org/browse/JGRP-2021]Plus fixed security issues in the refactored code. Removed ENCRYPT.
Measure round-trip times for RPCs via probe
[https://issues.jboss.org/browse/JGRP-2049]Keys 'rpcs' and 'rpcs-details' dump information about average RTTs between individual cluster members
Change message bundler at runtime
[https://issues.jboss.org/browse/JGRP-2058]Message bundlers can be changed at runtime via probe. This is useful to see the effect of different bundlers on performance, even in the same test run.
Probe
- Sort attributes for better legibility: https://issues.jboss.org/browse/JGRP-2066
- Removal of protocols via regular expressions
DELIVERY_TIME: new protocol to measure delivery time in the application
[https://issues.jboss.org/browse/JGRP-2101]Exposes stats via JMX and probe
RELAY2: sticky site masters
[https://issues.jboss.org/browse/JGRP-2112]When we have multiple site masters, messages from the same member should always be handled by the same site master. This prevents reordering of messages at the receiver.
Multiple elements in bind_addr
[https://issues.jboss.org/browse/JGRP-2113]
E.g.
<TCP
bind_addr="match-interface:eth2,10.5.5.5,match-interface:en*,127.0.0.1"
...
/>
This tries to bind to eth2 first, then to 10.5.5.5, then to an interface that starts with en0, and
finally to loopback.
Useful when running in an environment where the IP addresses and/or interfaces are not known before.
Multiple receiver threads in UDP
[https://issues.jboss.org/browse/JGRP-2146]Multiple threads can receive (and process) messages from a datagram socket, preventing queue buildups.
FRAG3
[https://issues.jboss.org/browse/JGRP-2154]Optimizations
RpcDispatcher: don't copy the first anycast
[https://issues.jboss.org/browse/JGRP-2010]Reduction of memory size of classes
- Rsp (32 -> 24 bytes): https://issues.jboss.org/browse/JGRP-2011
- Request: https://issues.jboss.org/browse/JGRP-2012
Remove one buffer copy in COMPRESS
[https://issues.jboss.org/browse/JGRP-2017]Replace Java serialization with JGroups marshalling
[https://issues.jboss.org/browse/JGRP-2033]Some internal classes still used Java serialization, which opens up security holes
(google 'java serialization vulnerability').
Faster marshalling / unmarshalling of messages
- Writing of headers: [https://issues.jboss.org/browse/JGRP-2042]
- Reading of headers: [https://issues.jboss.org/browse/JGRP-2043]
TCP: reduce blocking
[https://issues.jboss.org/browse/JGRP-2053]Message bundler improvements
[https://issues.jboss.org/browse/JGRP-2057]E.g. removal of SingletonAddress: not needed anymore as shared transports have been removed, too.
Protocol: addition of up(Message) and down(Message) callbacks
[https://issues.jboss.org/browse/JGRP-2067]This massively reduces the number of Event creations.
TransferQueueBundler: remove multiple messages
[https://issues.jboss.org/browse/JGRP-2076]Instead of removing messages one-by-one, the remover thread now removes as many messages as are in the queue (contention) into a local queue (no contention) and then creates and sends message batches off of the local queue.
Single thread pool
[https://issues.jboss.org/browse/JGRP-2099]There's only a single thread pool for all typs of messages, reducing the maintenance overhead of 4 thread pools and the configuration required.
The internal thread pool is still available (size to the number of cores), but not configurable.
A ForkJoinPool can be used instead of the thread pool (which can be disabled as well).
Timer: tasks can indicate that they will not block
[https://issues.jboss.org/browse/JGRP-2100]If a task calls execute() with an argument blocking==false, the task will be executed by the timer's main thread, and not be passed on to the timer's thread pool. This reduces the number of threads needed and therefore the number of context switches.
Headers are resized unnecessarily
[https://issues.jboss.org/browse/JGRP-2120]ByteArrayDataOutputStream: expand more conservatively
[https://issues.jboss.org/browse/JGRP-2124]Reading ints and longs creates unnecessary buffers
[https://issues.jboss.org/browse/JGRP-2125]Ints and longs are read into a byte[] array first, then parsed. This was changed to read the values and add them to the resulting ints or longs.
Should reduce overall memory allocation as ints and longs are used a lot in headers.
Table.removeMany() creates unneeded temp list
[https://issues.jboss.org/browse/JGRP-2126]This method is used a lot in NAKACK2 and UNICAST3. The change was to read messages directly into the resulting MessageBatch instead of a temp list and from there into the batch.
Reduce in-memory size of UnicastHeader3
[https://issues.jboss.org/browse/JGRP-2127]Reduced size from 40 -> 32 bytes.
Cache result of log.isTraceEnabled()
[https://issues.jboss.org/browse/JGRP-2130]This was done mainly for protocols where log.isTraceEnabled() was used a lot, such as TP, NAKACK2 or UNICAST3.
Note that the log level can still be changed at runtime.
Added MessageProcessingPolicy to define assigning of threads to messages or batches
[https://issues.jboss.org/browse/JGRP-2143]This only applies only to regular (not OOB or internal) messages. Make sure that only one message per member is processed at a given time by the thread pool.
This reduces the number of threads needed.
UNICAST3 / NAKACK2: more efficient adding and removing of messages / batches to/from tables
[https://issues.jboss.org/browse/JGRP-2150]Simpler algorithm and removal of one lock (= less contention)
Bug fixes
GMS sometimes ignores view bundling timeout
[https://issues.jboss.org/browse/JGRP-2028]UFC and MFC headers get mixed up
[https://issues.jboss.org/browse/JGRP-2072]Although indepent protocols, the protocol ID was assigned by the superclass, so replenish and credit messages would get mixed up, leading to stuttering in the sending of credits.
Flow control: replenish credits after message delivery
[https://issues.jboss.org/browse/JGRP-2084]MERGE3: merge is never triggered
[https://issues.jboss.org/browse/JGRP-2092]This is an edge case that was not covered before: every subgroup coordinator has some other member as coord:
A: BAC
B: CAB
C: ABC
MPING: restart fails
[https://issues.jboss.org/browse/JGRP-2116]UNICAST3 drops all messages until it receives one with first==true
[https://issues.jboss.org/browse/JGRP-2131]This caused a bug in ASYM_ENCRYPT.
ASYM_ENCRYPT: message batches are not handled correctly
[https://issues.jboss.org/browse/JGRP-2149]SYM_ENCRYPT: allow for other keystores besides JCEKS
[https://issues.jboss.org/browse/JGRP-2151]E.g. pcks#12 or jks
ASYM_ENCRYPT encrypts an empty buffer into a null buffer
[https://issues.jboss.org/browse/JGRP-2153]This caused an NPE.
Downloads
On Sourceforge: https://sourceforge.net/projects/javagroups/files/JGroups/4.0.0.Final,
or via Maven:
<groupId>org.jgroups</groupId>
<artifactId>jgroups</artifactId>
<version>4.0.0.Final</version>
Manual
The manual is at http://www.jgroups.org/manual4/index.html.The complete list of features and bug fixes can be found at
http://jira.jboss.com/jira/browse/JGRP.
Bela Ban, Kreuzlingen, Switzerland
March 2017
Thursday, September 08, 2016
Removing thread pools in JGroups 4.0
JGroups 3.x has 4 thread pools:
Hence the idea to club regular and OOB thread pools into one.
When I further thought about this, I realized that incoming messages could also be handled by a queue-less thread pool: by handling RejectedExecutionException thrown when the pool is full and simply spawning a new thread to process the internal message, so it wouldn't get dropped.
The same goes for timer tasks: a timer task (e.g. a message retransmission task) cannot get dropped, or retransmission would cease. However, by using the same mechanism as for internal messages, namely spawning a new thread when the thread pool is full, this can be solved.
Therefore, in 4.0 there's only a single thread pool handling regular, OOB and internal messages, plus timer tasks.
The new thread pool has no queue, or else it would not throw a RejectedExecutionException when a task is added, but simply queue it, which is not what we want for internal messages or timer tasks.
It also has a default rejection policy of "abort" which cannot be changed (only by substituting the thread pool with a custom pool).
This dramatically reduces configuration complexity: from 4 to 1 pools, and the new thread pool only exposes min-threads, max-threads, idle-time and enabled as configuration options.
Here's an example of a 3.x configuration:
timer_type="new3"
timer.min_threads="2"
timer.max_threads="4"
timer.keep_alive_time="3000"
timer.queue_max_size="500"
thread_pool.enabled="true"
thread_pool.min_threads="2"
thread_pool.max_threads="8"
thread_pool.keep_alive_time="5000"
thread_pool.queue_enabled="true"
thread_pool.queue_max_size="10000"
thread_pool.rejection_policy="discard"
internal_thread_pool.enabled="true"
internal_thread_pool.min_threads="1"
internal_thread_pool.max_threads="4"
internal_thread_pool.keep_alive_time="5000"
internal_thread_pool.queue_enabled="false"
internal_thread_pool.rejection_policy="discard"
oob_thread_pool.enabled="true"
oob_thread_pool.min_threads="1"
oob_thread_pool.max_threads="8"
oob_thread_pool.keep_alive_time="5000"
oob_thread_pool.queue_enabled="false"
oob_thread_pool.queue_max_size="100"
oob_thread_pool.rejection_policy="discard"
and here's the 4.0 configuration:
thread_pool.enabled="true"
thread_pool.min_threads="2"
thread_pool.max_threads="8"
thread_pool.keep_alive_time="5000"
Nice, isn't it?
Cheers,
- Regular thread pool: used for regular messages (default has a queue)
- OOB thread pool: used for OOB messages (no queue)
- Internal thread pool: used for JGroups internal messages only. The main raison d'etre for this pool was that internal messages such as heartbeats or credits should never get queued up behind other messages, and get processed immediately.
- Timer thread pool: all tasks in a timer need to be executed by a thread pool as they can potentially block
Hence the idea to club regular and OOB thread pools into one.
When I further thought about this, I realized that incoming messages could also be handled by a queue-less thread pool: by handling RejectedExecutionException thrown when the pool is full and simply spawning a new thread to process the internal message, so it wouldn't get dropped.
The same goes for timer tasks: a timer task (e.g. a message retransmission task) cannot get dropped, or retransmission would cease. However, by using the same mechanism as for internal messages, namely spawning a new thread when the thread pool is full, this can be solved.
Therefore, in 4.0 there's only a single thread pool handling regular, OOB and internal messages, plus timer tasks.
The new thread pool has no queue, or else it would not throw a RejectedExecutionException when a task is added, but simply queue it, which is not what we want for internal messages or timer tasks.
It also has a default rejection policy of "abort" which cannot be changed (only by substituting the thread pool with a custom pool).
This dramatically reduces configuration complexity: from 4 to 1 pools, and the new thread pool only exposes min-threads, max-threads, idle-time and enabled as configuration options.
Here's an example of a 3.x configuration:
timer_type="new3"
timer.min_threads="2"
timer.max_threads="4"
timer.keep_alive_time="3000"
timer.queue_max_size="500"
thread_pool.enabled="true"
thread_pool.min_threads="2"
thread_pool.max_threads="8"
thread_pool.keep_alive_time="5000"
thread_pool.queue_enabled="true"
thread_pool.queue_max_size="10000"
thread_pool.rejection_policy="discard"
internal_thread_pool.enabled="true"
internal_thread_pool.min_threads="1"
internal_thread_pool.max_threads="4"
internal_thread_pool.keep_alive_time="5000"
internal_thread_pool.queue_enabled="false"
internal_thread_pool.rejection_policy="discard"
oob_thread_pool.enabled="true"
oob_thread_pool.min_threads="1"
oob_thread_pool.max_threads="8"
oob_thread_pool.keep_alive_time="5000"
oob_thread_pool.queue_enabled="false"
oob_thread_pool.queue_max_size="100"
oob_thread_pool.rejection_policy="discard"
and here's the 4.0 configuration:
thread_pool.enabled="true"
thread_pool.min_threads="2"
thread_pool.max_threads="8"
thread_pool.keep_alive_time="5000"
Nice, isn't it?
Cheers,
Tuesday, August 23, 2016
JGroups 4.0.0.Alpha1 released
Howdy folks,
I just released a first alpha of JGroups 4.0.0 to SourceForge and maven. There are 12+ issues left, and I expect a final release in October.
The major version number was incremented because there are a lot of API changes and removal of deprecated classes/methods.
I'd be happy to get feedback on the API; there's still time to make a few API changes until the final release.
ENCRYPT had a couple of security flaws, and since the code hasn't been touched for almost a decade, I took this chance and completely refactored it into the 2 protocols above.
The code is now much more readable and maintainable.
For instance, TransferQueueBundler now removes chunks of message from the queue, rather than one-by-one: https://issues.jboss.org/browse/JGRP-2076.
I also added the ability to switch bundlers at runtime (via (e.g.) probe.sh op=TCP.bundler["rb"]): https://issues.jboss.org/browse/JGRP-2057 / https://issues.jboss.org/browse/JGRP-2058
Header creation was getting the class using a lookup with the magic ID, and then creating an instance. This was replaced by each header class implementing a create() method, resulting in header creation getting from 300ns down to 25ns.
https://issues.jboss.org/browse/JGRP-2043.
https://issues.jboss.org/browse/JGRP-2072
The manual has not been changed yet, so browse the source code or javadocs for details.
Enjoy! and feedback via the mailing list, please!
Cheers,
Bela Ban
I just released a first alpha of JGroups 4.0.0 to SourceForge and maven. There are 12+ issues left, and I expect a final release in October.
The major version number was incremented because there are a lot of API changes and removal of deprecated classes/methods.
I'd be happy to get feedback on the API; there's still time to make a few API changes until the final release.
Major API changes
- Deprecated classes and methods were removed (TCP_NIO, UNICAST, UNICAST2, NAKACK, PEER_LOCK, MERGE, MERGE2, FC, SCOPE etc)
- The shared transport was removed (https://issues.jboss.org/browse/JGRP-1844). Use fork channels instead.
- MuxRpcDispatcher was finally removed (https://issues.jboss.org/browse/JGRP-2008). YEAH!!!
- RpcDispatcher now uses CompletableFuture (https://issues.jboss.org/browse/JGRP-1605)
- Changed code to use new Java 8 features (e.g. streams): https://issues.jboss.org/browse/JGRP-2007
- Removed the Channel interface and merged it into JChannel: https://issues.jboss.org/browse/JGRP-2018
New features
Deliver message batches
Ability to handle message batches in addition to individual messages: receive(MessageBatch) was added to ReceiverAdapter: https://issues.jboss.org/browse/JGRP-2003Encryption and authentication
ENCRYPT was removed and replaced by SYM_ENCRYPT (keystore-based) and ASYM_ENCRYPT (asymmetric key exchange): https://issues.jboss.org/browse/JGRP-2021.ENCRYPT had a couple of security flaws, and since the code hasn't been touched for almost a decade, I took this chance and completely refactored it into the 2 protocols above.
The code is now much more readable and maintainable.
Optimizations
Removal of Java serialization
JGroups marshalling falls back to Java serialization if a class is not primitive or implements Streamable. This was completely removed for JGroups-internal classes: https://issues.jboss.org/browse/JGRP-2033MessageBundler improvements
The performance of existing bundlers has been improved, and new bundlers (RingBufferBundler, NoBundler) have been added. Since the shared transport feature was removed, SingletonAddress used as key into a bundler's hashmap could also be removed.For instance, TransferQueueBundler now removes chunks of message from the queue, rather than one-by-one: https://issues.jboss.org/browse/JGRP-2076.
I also added the ability to switch bundlers at runtime (via (e.g.) probe.sh op=TCP.bundler["rb"]): https://issues.jboss.org/browse/JGRP-2057 / https://issues.jboss.org/browse/JGRP-2058
Reduction of event creation
For every message sent up or down, a new Event (wrapping the message) had to be created. Adding 2 separate up(Message) and down(Message) callbacks removed that unneeded creation: https://issues.jboss.org/browse/JGRP-2067Faster header marshalling and creation
Headers used to do a hashmap lookup to get their magic-id when getting marshalled. Now they carry the magic ID directly in the header. This saves a hashmap lookup per header (usually we have 3-4 headers per message): https://issues.jboss.org/browse/JGRP-2042.Header creation was getting the class using a lookup with the magic ID, and then creating an instance. This was replaced by each header class implementing a create() method, resulting in header creation getting from 300ns down to 25ns.
https://issues.jboss.org/browse/JGRP-2043.
Bug fixes
MFC and UFC headers getting mixed up
This is a regression that led to premature exhaustion of credits, slowing things down:https://issues.jboss.org/browse/JGRP-2072
Download
<groupId>org.jgroups</groupId> <artifactId>jgroups</artifactId> <version>4.0.0-Alpha1</version>
The manual has not been changed yet, so browse the source code or javadocs for details.
Enjoy! and feedback via the mailing list, please!
Cheers,
Bela Ban
Wednesday, February 24, 2016
JGroups 3.6.8 released
FYI, 3.6.8.Final has been released.
Not a big release; it contains mostly optimizations and a nice probe improvement. The main issues are listed below.
Enjoy!
Bela
Not a big release; it contains mostly optimizations and a nice probe improvement. The main issues are listed below.
Enjoy!
Bela
New features ============ Probe improvements ------------------ [https://issues.jboss.org/browse/JGRP-2004] [https://issues.jboss.org/browse/JGRP-2005] - Proper discarding of messages from a different cluster with '-cluster' option. - Less information per cluster member; only the requested information is returned - Detailed information about RPCs (number of sync, async RPCs, plus timings) - http://www.jgroups.org/manual/index.html#_looking_at_details_of_rpcs_with_probe Optimizations ============= DONT_BUNDLE and OOB: messages are not removed from batch when execution fails ----------------------------------------------------------------------------- [https://issues.jboss.org/browse/JGRP-2015] - Messages are not removed from batch when execution fails - Rejections are not counted to num_rejected_msgs COMPRESS: removed 1 byte[] buff copy ------------------------------------ [https://issues.jboss.org/browse/JGRP-2017] An unneeded copy of the compressed payload was created when sending and compressing a message. The fix should reduce memory allocation pressure quite a bit. RpcDispatcher: don't copy the first anycast ------------------------------------------- [https://issues.jboss.org/browse/JGRP-2010] When sending an anycast to 3 destinations, JGroups sends a copy of the original message to all 3. However, the first doesn't need to be copied (less memory allocation pressure). For an anycast to a single destination, no copy is needed, either. Compaction of in-memory size ---------------------------- [https://issues.jboss.org/browse/JGRP-2011] [https://issues.jboss.org/browse/JGRP-2012] - Reduced size of Rsp (used in every RPC) from 32 -> 24 bytes - Request/UnicastRequest/GroupRequest: reduced size RequestCorrelator.done() is slow -------------------------------- [https://issues.jboss.org/browse/JGRP-2013] Used by RpcDispatcher. Fixed by eliminating the linear search done previously. Bug fixes ========= FILE_PING: consider special characters in file names ---------------------------------------------------- [https://issues.jboss.org/browse/JGRP-2014] Names like "A/web-cluster" would fail on Windows as the slash char ('/') was treated as demarcation char in some clouds. Manual ====== The manual is at http://www.jgroups.org/manual/index.html. The complete list of features and bug fixes can be found at http://jira.jboss.com/jira/browse/JGRP. Bela Ban, Kreuzlingen, Switzerland Feb 2016
Wednesday, January 20, 2016
Dump RPC stats with JGroups
When using remote procedure calls (RPCs) across a cluster using RpcDispatcher, it would be interesting to know how many RPCs of which type (unicast, multicast, anycast) are invoked by whom to whom.
I added this feature in 3.6.8-SNAPSHOT [1]. The documentation is here: [2].
As a summary, since this feature is costly, it has to be enabled with
probe.sh rpcs-enable-details (and disabled with rpcs-disable-details).
From now on, invocation times of synchronous (blocking) RPCs will be recorded (async RPCs will be ignored).
RPC stats can be dumped with probe.sh rpcs-details:
This shows the stats for each member in a given cluster, e.g. number of unicast, multicast and anycast RPCs, per target destination, plus min/max and average invocation times for sync RPCs per target as well.
Probe just become even more powerful! :-)
Enjoy!
[1] https://issues.jboss.org/browse/JGRP-2005
[2] Documentation: http://www.jgroups.org/manual/index.html#_looking_at_details_of_rpcs_with_probe
I added this feature in 3.6.8-SNAPSHOT [1]. The documentation is here: [2].
As a summary, since this feature is costly, it has to be enabled with
probe.sh rpcs-enable-details (and disabled with rpcs-disable-details).
From now on, invocation times of synchronous (blocking) RPCs will be recorded (async RPCs will be ignored).
RPC stats can be dumped with probe.sh rpcs-details:
[belasmac] /Users/bela/JGroups$ probe.sh rpcs rpcs-details -- sending probe on /224.0.75.75:7500 #1 (481 bytes): local_addr=C [ip=127.0.0.1:55535, version=3.6.8-SNAPSHOT, cluster=uperf, 4 mbr(s)] uperf: sync multicast RPCs=0 uperf: async unicast RPCs=0 uperf: async multicast RPCs=0 uperf: sync anycast RPCs=67480 uperf: async anycast RPCs=0 uperf: sync unicast RPCs=189064 rpcs-details= D: async: 0, sync: 130434, min/max/avg (ms): 0.13/924.88/2.613 A: async: 0, sync: 130243, min/max/avg (ms): 0.11/926.35/2.541 B: async: 0, sync: 63346, min/max/avg (ms): 0.14/73.94/2.221 #2 (547 bytes): local_addr=A [ip=127.0.0.1:65387, version=3.6.8-SNAPSHOT, cluster=uperf, 4 mbr(s)] uperf: sync multicast RPCs=5 uperf: async unicast RPCs=0 uperf: async multicast RPCs=0 uperf: sync anycast RPCs=67528 uperf: async anycast RPCs=0 uperf: sync unicast RPCs=189200 rpcs-details= <all>: async: 0, sync: 5, min/max/avg (ms): 2.11/9255.10/4917.072 C: async: 0, sync: 130387, min/max/avg (ms): 0.13/929.71/2.467 D: async: 0, sync: 63340, min/max/avg (ms): 0.13/63.74/2.469 B: async: 0, sync: 130529, min/max/avg (ms): 0.13/929.71/2.328 #3 (481 bytes): local_addr=B [ip=127.0.0.1:51000, version=3.6.8-SNAPSHOT, cluster=uperf, 4 mbr(s)] uperf: sync multicast RPCs=0 uperf: async unicast RPCs=0 uperf: async multicast RPCs=0 uperf: sync anycast RPCs=67255 uperf: async anycast RPCs=0 uperf: sync unicast RPCs=189494 rpcs-details= C: async: 0, sync: 130616, min/max/avg (ms): 0.13/863.93/2.494 A: async: 0, sync: 63210, min/max/avg (ms): 0.14/54.35/2.066 D: async: 0, sync: 130177, min/max/avg (ms): 0.13/863.93/2.569 #4 (482 bytes): local_addr=D [ip=127.0.0.1:54944, version=3.6.8-SNAPSHOT, cluster=uperf, 4 mbr(s)] uperf: sync multicast RPCs=0 uperf: async unicast RPCs=0 uperf: async multicast RPCs=0 uperf: sync anycast RPCs=67293 uperf: async anycast RPCs=0 uperf: sync unicast RPCs=189353 rpcs-details= C: async: 0, sync: 63172, min/max/avg (ms): 0.13/860.72/2.399 A: async: 0, sync: 130342, min/max/avg (ms): 0.13/862.22/2.338 B: async: 0, sync: 130424, min/max/avg (ms): 0.13/866.39/2.350
This shows the stats for each member in a given cluster, e.g. number of unicast, multicast and anycast RPCs, per target destination, plus min/max and average invocation times for sync RPCs per target as well.
Probe just become even more powerful! :-)
Enjoy!
[1] https://issues.jboss.org/browse/JGRP-2005
[2] Documentation: http://www.jgroups.org/manual/index.html#_looking_at_details_of_rpcs_with_probe
Monday, January 18, 2016
JGroups workshop in Munich April 4-8 2016
I'm happy to announce another JGroups workshop in Munich April 4-8 2016 !
The registration is now open at [2].
The agenda is at [3] and includes an overview of the basic API, building blocks, advanced topics and an in-depth look at the most frequently used protocols, plus some admin (debugging, tracing,diagnosis) stuff.
We'll be doing some hands-on demos, looking at code and I'm always trying to make the workshops as hands-on as possible.
I'll be teaching the workshop myself, and I'm looking forward to meeting some of you and having beers in Munich downtown! For attendee feedback on courses last year check out [1].
Note that the exact location in Munich has not yet been picked, I'll update the registration and send out an email to already registered attendees once this is the case (by the end of January the latest).
The course has a min limit of 5 and a max limit of 15 attendees.
I'm planning to do another course in Boston or New York in the fall of 2016, but plans have not yet finalized.
Cheers, and I hope to see many of you in Munich!
Bela Ban
[1] http://www.jgroups.org/workshops.html
[2] http://www.amiando.com/WorkshopMunich
[3] https://github.com/belaban/workshop/blob/master/slides/toc.adoc
Tuesday, January 12, 2016
JGroups 3.6.7.Final released
I'm happy to announce that 3.6.7.Final has been released!
This release contains a few bug fixes, but is mainly about optimizations reducing memory consumption and allocation rates.
The other optimization was in TCP_NIO2, which is now as fast as TCP. It is slated to become the successor to TCP, as it uses fewer threads and since it's built on NIO, should be much more scalable.
3.6.7.Final can be downloaded from SourceForge [1] or used via maven (groupId=org.jgroups / artifactId=jgroups, version=3.6.7.Final).
Below is a list of the major issues resolved.
Enjoy!
[1] https://sourceforge.net/projects/javagroups/files/JGroups/3.6.7.Final/
This was changed to reusing the same buffers in UDP, TCP and TCP_NIO2, by reading the network data into one of those buffers, de-serializing the message (or message batch) and then passing it to one of the thread pools.
The effect is a much lower memory allocation rate.
The effect is a smaller memory allocation rate on the send path.
We now have a reader thread for every NioConnection which processes reads (using work stealing) separate from the selector thread. When idle for some time, the reader thread terminates and a new thread is created on subsequent data available to be read.
UPerf (4 nodes) showed a perf increase from 15'000 msgs/sec/node to 24'000. TCP_NIO2's speed is now roughly the same as TCP.
This reduces the memory needed for a message by ca 22 bytes!
Also, on Windows, IPv6 wouldn't work: https://github.com/belaban/JGroups/wiki/FAQ.
This release contains a few bug fixes, but is mainly about optimizations reducing memory consumption and allocation rates.
The other optimization was in TCP_NIO2, which is now as fast as TCP. It is slated to become the successor to TCP, as it uses fewer threads and since it's built on NIO, should be much more scalable.
3.6.7.Final can be downloaded from SourceForge [1] or used via maven (groupId=org.jgroups / artifactId=jgroups, version=3.6.7.Final).
Below is a list of the major issues resolved.
Enjoy!
[1] https://sourceforge.net/projects/javagroups/files/JGroups/3.6.7.Final/
New features
Interoperability between TCP and TCP_NIO2
[https://issues.jboss.org/browse/JGRP-1952]This allows nodes that have TCP as transport to talk to nodes that have TCP_NIO2 as transport, and vice versa.
Optimizations
Transport: reuse of receive buffers
[https://issues.jboss.org/browse/JGRP-1998]On a message reception, the transport would create a new buffer in TCP and TCP_NIO2 (not in UDP), read the message into that buffer and then pass it to the one of thread pools, copying single messages (not batches).
This was changed to reusing the same buffers in UDP, TCP and TCP_NIO2, by reading the network data into one of those buffers, de-serializing the message (or message batch) and then passing it to one of the thread pools.
The effect is a much lower memory allocation rate.
Message bundling: reuse of send buffers
[https://issues.jboss.org/browse/JGRP-1989]When sending messages, a new buffer would be created for marshalling for every message (or message bundle). This was changed to reuse the same buffer for all messages or message bundles.
The effect is a smaller memory allocation rate on the send path.
TCP_NIO2: copy on-demand when sending messages
[https://issues.jboss.org/browse/JGRP-1991]If a message sent by TCP_NIO2 cannot be put entirely into the network buffer of the OS, then the remainder of that message is copied. This is needed to implement reusing of send buffers, see JGRP-1989 above.
TCP_NIO2: single selector slows down writes and reads
[https://issues.jboss.org/browse/JGRP-1999]This transport used to have a single selector, processing both writes and reads in the same thread. Writes are not expensive, but reads can be, as de-serialization adds up.
We now have a reader thread for every NioConnection which processes reads (using work stealing) separate from the selector thread. When idle for some time, the reader thread terminates and a new thread is created on subsequent data available to be read.
UPerf (4 nodes) showed a perf increase from 15'000 msgs/sec/node to 24'000. TCP_NIO2's speed is now roughly the same as TCP.
Headers: collapse 2 arrays into 1
[https://issues.jboss.org/browse/JGRP-1990]A Message had a Headers instance which had an array for header IDs and another one for the actual headers. These 2 arrays were collapsed into a single array and Headers is not a separate class anymore, but the array is managed directly inside Message.
This reduces the memory needed for a message by ca 22 bytes!
RpcDispatcher: removal of unneeded field in a request
[https://issues.jboss.org/browse/JGRP-2001]The request-id was carried in both the Request (UnicastRequest or MulticastRequest) and the header, which is duplicate and a waste. Removed from Request and also removed rsp_expected from the header, total savings ca. 9 bytes per RPC.
Switched back from DatagramSocket to MulticastSocket for sending of IP multicasts
[https://issues.jboss.org/browse/JGRP-1970]This caused some issues in MacOS based systems: when the routing table was not setup correctly, multicasting would not work (nodes wouldn't find each other).
Also, on Windows, IPv6 wouldn't work: https://github.com/belaban/JGroups/wiki/FAQ.
Make the default number of headers in a message configurable
[https://issues.jboss.org/browse/JGRP-1985]The default was 3 (changed to 4 now) and if we had more headers, then the headers array needed to be resized (unneeded memory allocation).
Message bundling
[https://issues.jboss.org/browse/JGRP-1986]When the threshold of the send queue was exceeded, the bundler thread would send messages one-by-one, leading to bad performance.
TransferQueueBundler: switch to array from linked list for queue
[https://issues.jboss.org/browse/JGRP-1987]Less memory allocation overhead.
Bug fixes
SASL now handles merges correctly
[https://issues.jboss.org/browse/JGRP-1967]
FRAG2: message corruption when thread pools are disabled
[https://issues.jboss.org/browse/JGRP-1973]
Discovery leaks responses
[https://issues.jboss.org/browse/JGRP-1983]
Manual
The manual is at http://www.jgroups.org/manual/index.html.
The complete list of features and bug fixes can be found at http://jira.jboss.com/jira/browse/JGRP.
The complete list of features and bug fixes can be found at http://jira.jboss.com/jira/browse/JGRP.
Tuesday, November 03, 2015
Talk at Berlin JUG Nov 19
For those of you living in Berlin, mark your calendars: there's an event [1] held by the JUG Berlin-Brandenburg Nov 19 on
Hope to see many of you there !
[1] http://www.jug-berlin-brandenburg.de/
- JGroups (yours truly)
- New features of Infinispan 8 (Galder Zamarreno)
- Infinispan (Tristan Tarrant) and
- Wildfly clustering (Paul Ferraro)
Hope to see many of you there !
[1] http://www.jug-berlin-brandenburg.de/
Wednesday, September 09, 2015
JGroups 3.6.6.Final released
I don't like releasing a week after I released 3.6.5, but the Infinispan team found 2 critical bugs in TCP_NIO2:
So, there it is: 3.6.6.Final ! :-)
Enjoy (and find more bugs in TCP_NIO2) !
- Messages would get corrupted as they were sent asynchronously and yet the buffer was reused and modified while the send was in transit (JGRP-1961)
- TCP_NIO2 could start dropping messages because selection key registration was not thread safe: JGRP-1963
So, there it is: 3.6.6.Final ! :-)
Enjoy (and find more bugs in TCP_NIO2) !
Thursday, September 03, 2015
JGroups 3.6.5 released
I'm happy to announce that 3.6.5 has been released !
One more patch release (3.6.6) is planned, and then I'll start working on 4.0 which will require Java 8. I'm looking forward to finally also being able to start using functional programming ! :-) (Note that I wrote my diploma thesis in Common Lisp back in the days...)
The major feature of 3.6.5 is certainly support for non-blocking TCP, based on NIO.2. While I don't usually add features to a patch release, I didn't want to create a 3.7.0, and I wanted users to be able to still use Java 7, and not require 8 in order to use the NIO stuff.
Here's a summary of the more important changes in 3.6.5:
This new transport is based on NIO.2 and non-blocking, ie. no reads or writes will ever block. The biggest advantage compared to TCP is that we moved from the 1-thread-per-connection model to the 1-selector-for-all-connections model.
This means that we use 1 thread for N connections in TCP_NIO2, while TCP used N threads.
To use this, new classes TcpClient / NioClient and TcpServer / NioServer have been created.
More details at http://belaban.blogspot.ch/2015/07/a-new-nio.html.
Fork channels used to throw an exception on calling ForkChannel.getState(). This is now supported; details in the JIRA issue.
GossipRouter can now use a blocking (TcpServer) or a non-blocking (NioServer) implementation. On the client side, RouterStub (TUNNEL and TCPGOSSIP) can do the same, using TcpClient or NioClient.
Which implementation is used is governed by the -nio flag when starting the router, or in the configuration of TUNNEL / TCPGOSSIP (use_nio).
Blocking clients can interact with a non-blocking GossipRouter, and vice versa.
Retransmissions use the internal flag: when a retransmission is a request, a potential response was also flagged as internal. This flag is now cleared on reception of a request.
Caused by a conversion from nanos to millis.
Was not the case as we used a HashSet which reordered elements.
Request/response format has changed from application/xml to application/json in the Identity API.
The manual is at http://www.jgroups.org/manual/index.html.
The complete list of features and bug fixes can be found at http://jira.jboss.com/jira/browse/JGRP.
Enjoy !
Bela Ban, Kreuzlingen, Switzerland, Sept 2015
One more patch release (3.6.6) is planned, and then I'll start working on 4.0 which will require Java 8. I'm looking forward to finally also being able to start using functional programming ! :-) (Note that I wrote my diploma thesis in Common Lisp back in the days...)
The major feature of 3.6.5 is certainly support for non-blocking TCP, based on NIO.2. While I don't usually add features to a patch release, I didn't want to create a 3.7.0, and I wanted users to be able to still use Java 7, and not require 8 in order to use the NIO stuff.
Here's a summary of the more important changes in 3.6.5:
TCP_NIO2: new non-blocking transport based on NIO.2
[https://issues.jboss.org/browse/JGRP-886]This new transport is based on NIO.2 and non-blocking, ie. no reads or writes will ever block. The biggest advantage compared to TCP is that we moved from the 1-thread-per-connection model to the 1-selector-for-all-connections model.
This means that we use 1 thread for N connections in TCP_NIO2, while TCP used N threads.
To use this, new classes TcpClient / NioClient and TcpServer / NioServer have been created.
More details at http://belaban.blogspot.ch/2015/07/a-new-nio.html.
Fork channels now support state transfer
[https://issues.jboss.org/browse/JGRP-1941]Fork channels used to throw an exception on calling ForkChannel.getState(). This is now supported; details in the JIRA issue.
GossipRouter has been reimplemented using NIO
[https://issues.jboss.org/browse/JGRP-1943]GossipRouter can now use a blocking (TcpServer) or a non-blocking (NioServer) implementation. On the client side, RouterStub (TUNNEL and TCPGOSSIP) can do the same, using TcpClient or NioClient.
Which implementation is used is governed by the -nio flag when starting the router, or in the configuration of TUNNEL / TCPGOSSIP (use_nio).
Blocking clients can interact with a non-blocking GossipRouter, and vice versa.
Retransmissions use the INTERNAL flag
[https://issues.jboss.org/browse/JGRP-1940]Retransmissions use the internal flag: when a retransmission is a request, a potential response was also flagged as internal. This flag is now cleared on reception of a request.
Lock.tryLock() can wait forever
[https://issues.jboss.org/browse/JGRP-1949]Caused by a conversion from nanos to millis.
TCPPING: access initial_hosts in the defined order
[https://issues.jboss.org/browse/JGRP-1959]Was not the case as we used a HashSet which reordered elements.
SWIFT_PING: support JSON
[https://issues.jboss.org/browse/JGRP-1954]Request/response format has changed from application/xml to application/json in the Identity API.
The manual is at http://www.jgroups.org/manual/index.html.
The complete list of features and bug fixes can be found at http://jira.jboss.com/jira/browse/JGRP.
Enjoy !
Bela Ban, Kreuzlingen, Switzerland, Sept 2015
Monday, July 27, 2015
A new NIO.2 based transport
I'm happy to announce a new transport based on NIO.2: TCP_NIO2 !
The new transport is completely non-blocking, so - contrary to TCP - never blocks on a socket connect, read or write.
The big advantage of TCP_NIO2 over TCP is that it doesn't need to create one reader thread per connection (and possibly a writer thread as well, if send queues are enabled).
With a cluster of 1000 nodes, in TCP every node would have 999 reader threads and 999 connections. While we still have 999 TCP connections open (max), in TCP_NIO2 we only have a single selector thread servicing all connections. When data is available to be read, we read as much data as we can without blocking, and then pass the read message(s) off to the regular or OOB thread pools for processing.
This makes TCP_NIO2 a more scalable and non-blocking alternative to TCP.
Performance
I ran the UPerf and MPerf tests [3] on a 9 node cluster (8-core boxes with ~5300 bogomips and 1 GB networking) and got the following results:UPerf (500'000 requests/node, 50 invoker threads/node):
TCP: 62'858 reqs/sec/node, TCP_NIO2: 65'387 reqs/sec/node
MPerf (1 million messages/node, 50 sender threads/node):
TCP: 69'799 msgs/sec/node, TCP_NIO2: 77'126 msgs/sec/node
So TCP_NIO2 was better in both cases, which surprised me a bit as there have been reports claiming that the BIO approach was faster.
I therefore recommend run the tests in your own environment, with your own application, to get numbers that are meaningful in your system.
The documentation is here: [1].
Cheers,
[1] http://www.jgroups.org/manual/index.html#TCP_NIO2
[2] https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/TCP_NIO2.java
[3] http://www.jgroups.org/manual/index.html#PerformanceTests
Friday, May 15, 2015
Release of jgroups-raft 0.2
I'm happy to announce the first usable release of jgroups-raft [1] !
Compared to 0.1, which was a mere prototype, 0.2 has a lot more features and is a lot more robust. Besides fixing quite a few bugs and adding unit tests to prevent future regressions, I
Cheers,
[1] http://belaban.github.io/jgroups-raft
[2] https://github.com/belaban/jgroups-raft/issues?q=milestone%3A0.2+is%3Aclosed
[3] https://groups.google.com/forum/#!forum/jgroups-raft
Compared to 0.1, which was a mere prototype, 0.2 has a lot more features and is a lot more robust. Besides fixing quite a few bugs and adding unit tests to prevent future regressions, I
- switched to Java 8
- implemented dynamic addition and removal of servers
- wrote the manual, and
- wrote a consensus based replicated counter
Cheers,
[1] http://belaban.github.io/jgroups-raft
[2] https://github.com/belaban/jgroups-raft/issues?q=milestone%3A0.2+is%3Aclosed
[3] https://groups.google.com/forum/#!forum/jgroups-raft
Wednesday, April 29, 2015
JGroups workshops in New York and Mountain View
I'm happy to announce that we're offering 2 JGroups trainings in the US: in New York and Mountain View in Sept 2015 !
The workshop will be interactive and is for medium to advanced developers. I'm teaching both workshops, so I should be able to answer all JGroups related questions ... :-)
An overview of what we'll be doing over the 4.5 days is here:
https://github.com/belaban/workshop/blob/master/slides/toc.adoc.
To get more info and to register visit http://www.jgroups.org/workshops.html.
Registration is now open. The class size is limited to 20 each.
Hope to see someof you at a workshop this year !
The workshop will be interactive and is for medium to advanced developers. I'm teaching both workshops, so I should be able to answer all JGroups related questions ... :-)
An overview of what we'll be doing over the 4.5 days is here:
https://github.com/belaban/workshop/blob/master/slides/toc.adoc.
To get more info and to register visit http://www.jgroups.org/workshops.html.
Registration is now open. The class size is limited to 20 each.
Hope to see someof you at a workshop this year !
Tuesday, March 17, 2015
Everything you always wanted to know about JGroups (but were afraid to ask): JGroups workshop in Berlin
I'm happy to announce a JGroups workshop in Berlin June 1-5 2015 !
This is your chance to learn everything you always wanted to know about JGroups... and more :-)
This is the second in a series of 4 workshops I'll teach this year; 2 in Europe and 2 in the US (NYC and Mountain View, more on the US workshops to be announced here soon).
Rome is unfortunately already sold out, but Berlin's a nice place, too...
The workshop is 5 days and attendees will learn the following [1]:
The price is 1'500 EUR (early bird: 1'000 EUR). This gets you a week of total immersion into JGroups and beers in the evening with me (not sure this is a good thing though :-))...
Registration [2] is now open (15 tickets only because I want to have a max of 20 attendees - 5 already registered). There's an early bird registration rate (500 EUR off) valid until April 10. Use code JGRP2015 to get the early bird.
The recommended hotel is nhow Berlin [3]. Workshop attendees will get a special rate; check here again in a few days (end of March the latest) on how to book a room at a discounted rate.
Hope to see some of you in Berlin in June !
Cheers,
[1] https://github.com/belaban/workshop/blob/master/slides/toc.adoc
[2] http://www.amiando.com/JGroupsWorkshopBerlin
[3] http://www.nh-hotels.de/hotel/nhow-berlin
This is your chance to learn everything you always wanted to know about JGroups... and more :-)
This is the second in a series of 4 workshops I'll teach this year; 2 in Europe and 2 in the US (NYC and Mountain View, more on the US workshops to be announced here soon).
Rome is unfortunately already sold out, but Berlin's a nice place, too...
The workshop is 5 days and attendees will learn the following [1]:
- Monday: API [introductory]
- Tuesday: Building blocks (RPCs, distributed locks, counters etc) [medium]
- Wednesday/Thursday: advanced topics and protocols [advanced]
- Friday: admin stuff [medium]
The price is 1'500 EUR (early bird: 1'000 EUR). This gets you a week of total immersion into JGroups and beers in the evening with me (not sure this is a good thing though :-))...
Registration [2] is now open (15 tickets only because I want to have a max of 20 attendees - 5 already registered). There's an early bird registration rate (500 EUR off) valid until April 10. Use code JGRP2015 to get the early bird.
The recommended hotel is nhow Berlin [3]. Workshop attendees will get a special rate; check here again in a few days (end of March the latest) on how to book a room at a discounted rate.
Hope to see some of you in Berlin in June !
Cheers,
[1] https://github.com/belaban/workshop/blob/master/slides/toc.adoc
[2] http://www.amiando.com/JGroupsWorkshopBerlin
[3] http://www.nh-hotels.de/hotel/nhow-berlin
Subscribe to:
Posts (Atom)