Thursday, August 06, 2020

JGroups 5.0.0.Final released

I'm happy to announce that JGroups 5.0.0.Final has been released!

The new features are described in [1]. Below's a list of the major JIRAs:
  • https://issues.redhat.com/browse/JGRP-2218: this is the most important change in 5.0.0: it changes Message into an interface and allows for different implementations of Message
  • https://issues.redhat.com/browse/JGRP-2450: support for virtual threads (fibers). If the JDK (probably 16 and higher) supports virtual threads, then they can be enabled by setting use_fibers to true in the transport. This will effectively bypass the thread pool(s) and use virtual threads instead. See [2] for details.
  • https://issues.redhat.com/browse/JGRP-2451: FD_ALL3 is a more efficient failure detection protocol; counts messages received from P as heartbeats, and P suppresses heartbeats when sending messages. This should reduce traffic on the network
  • https://issues.redhat.com/browse/JGRP-2462: implementation of Random Early Drop (RED) protocol, which starts dropping messages on the send side when the queue becomes full. This prevents message storms (by unneeded retransmission requests when messages are not received) and/or blocking
  • https://issues.redhat.com/browse/JGRP-2402: new protocol SOS to captures vital stats and dump them to a file periodically
  • https://issues.redhat.com/browse/JGRP-2401: versioned configuration. Stacks won't start if the versions of JGroups and the configuration differ (not for micro versions). This prevents use of old/outdated configurations with a newer JGroups release
  • https://issues.redhat.com/browse/JGRP-2476: more efficient marshalling of classes. Reduces size of RPCs in RpcDispatcher
The documentation can be found at [3].
Enjoy!




Friday, July 17, 2020

Double your performance: virtual threads (fibers) and JDK 15/16!

If you use UDP as transport and want to double your performance: read on!

If you use TCP, your performance won't change much. You might still be interested in what more recent JDKs and virtual threads (used to be called 'fibers') will bring to the table.

Virtual threads


Virtual threads are lightweight threads, similar in concept to the old Green Threads, and are managed by the JVM rather than the kernel. Many virtual threads can map to the same OS native (carrier) thread (only one at a time, of course), so we can have millions of virtual threads.

Virtual threads are implemented with continuations, but that's a detail. What's important is that all blocking calls in the JDK (LockSupport.park() etc) have been modified to yield rather than block. This means that we don't waste the precious native carrier thread, but simply go to a non-RUNNING state. When the block is over, the thread is simply marked as RUNNABLE again and the scheduler continues the continuation where it left off.

Main advantages:
  • Blocking calls don't need to be changed, e.g. into reactive calls
  • No need for thread pools: simply create a virtual thread
  • Fewer context switches (reduced/eliminated blocking calls)
  • We can have lots of virtual threads
It will be a while until virtual threads show up in your JDK, but JGroups has already added support for it: just set use_fibers="true" in the transport. If the JVM supports virtual threads, they will be used, otherwise we fall back to regular native threads.


UDP: networking improvements 

While virtual threads bring advantages to JGroups, the other performance increase can be had by trying a more recent JDK.

Starting in JDK 15, the implementation of DatagramSockets and MulticastSockets has been changed to delegate to DatagramChannels and MulticastChannels. In addition, virtual threads are supported.

This increases the performance of UDP which uses DatagramChannels and MulticastChannels.

The combination of networking code improvements and virtual threads leads to astonishing results for UDP, read below.

Performance

I used UPerf for testing on a cluster of 8 (physical) boxes (1 GBit ethernet), with JDKs 11, 16-ea5 and 16-loom+2-14. The former two use native threads, the latter uses virtual threads.

As can be seen in [1], UDP's performance goes from 44'691 on JDK 11 to 81'402 on JDK 16-ea5; that's a whopping 82% increase! Enabling virtual threads increases the performance between 16-ea5 and 16-loom+2-14 to 88'252, that's another 8%!

The performance difference between JDK 11 and 16-loom is 97%!

The difference in TCP's performance is miniscule; I guess because the TCP code was already optimized in JDK 11.

Running in JDK 16-loom+2-14 shows that UDP's performance is now on par with TCP, as a matter of fact, UDP is even 3% faster than TCP!

If you want to try for yourself: head over to the JGroups Github repo and create the JAR (ant jar). Or wait a bit: I will soon release 5.0.0.Final which contains the changes.

Not sure if I want to backport the changes to the 4.x branch...

Enjoy!

[1] https://drive.google.com/file/d/1Ars1LOM7cEf6AWpPwZHeIfu_kKLa9gv0/view?usp=sharing

[2] http://openjdk.java.net/jeps/373

Tuesday, June 30, 2020

New Netty transport

I'm happy to announce that Baizel Mathew and Steven Wong have written a new transport protocol using Netty!

Read Baizel's announcement here: [1]; for the code look here: [2].

I anticipate that the Netty transport will replace TCP_NIO2 over time.

The maven coordinates are:

<dependency>
  <groupId>org.jgroups</groupId>
  <artifactId>jgroups-netty</artifactId>
  <version>1.0.0.Alpha2</version>
</dependency>


Thanks, Baizel and Steven, for your contribution!


[1] https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/jgroups-dev/R3yxmfhcqMk/ugked7zaAgAJ
[2] https://github.com/jgroups-extras/jgroups-netty

Monday, April 20, 2020

Hybrid clouds with JGroups and Skupper

This is a follow-up post on [1], which showed how to connect two Kubernetes-based hybrid clouds (Google GKE and AWS EKS) with JGroups' TUNNEL and GossipRouter.

Meanwhile, I've discovered Skupper, which (1) simplifies this task and (as a bonus) (2) encrypts the data exchanged between different clouds.

In this post, I'm going to provide step-by-step instructions on how to connect a Google Kubernetes Engine (GKE) cluster with a cluster running on my local box.

To run the demo yourself, you must have Skupper installed and a GKE account. However, any other cloud provider works, too.

For the local cloud, I'm using docker-desktop. Alternatively, minikube could be used.

So let's get cracking, and start the GKE cluster. To avoid having to switch contexts with kubectl all the time, I suggest start 2 separate shells and set KUBECONFIG for the public (GKE) cloud to a copy of config:

Shell 1 (GKE): cp .kube/config .kube/gke; export KUBECONFIG=$HOME/.kube/gke

Now start a GKE cluster (in shell 1):
gcloud container clusters create gke  --num-nodes 4

NOTE: if you use a different cloud, simply start your cluster and set kubectl's context to point to your cluster. The rest of the instructions below apply regardless of the specific cloud.

This sets the Kubernetes context (shell 1): 
kubectl config current-context
gke_ispnperftest_us-central1-a_gke


In shell 2, confirm that the context is local:
kubectl config current-context
docker-desktop

              

This shows Kubernetes is pointing to docker-desktop.

Let's now start a GossipRouter in both clouds. To do this, we have to modify the YAML used in [1] slightly:
curl https://raw.githubusercontent.com/belaban/jgroups-docker/master/yaml/gossiprouter.yaml > gossiprouter.yaml

Now comment lines 42-43:
spec:
#  type: LoadBalancer
#  externalTrafficPolicy: Local


This is needed by Skupper which requires a service to be exposed as a ClusterIP and not a LoadBalancer.

Now deploy it in both shells:
kubectl apply -f gossiprouter.yaml
deployment.apps/gossiprouter created
service/gossiprouter created


Now it is time to initialize Skupper in both shells:
skupper init
Waiting for LoadBalancer IP or hostname...
Skupper is now installed in namespace 'default'.  Use 'skupper status' to get more information.


This installs some pods and services/proxies:
kubectl get po,svc
NAME                                           READY   STATUS    RESTARTS   AGE
pod/gossiprouter-6d6dcd6d79-q9p2f              1/1     Running   0          4m6s
pod/skupper-proxy-controller-dcf99c6bf-whns4   1/1     Running   0          86s
pod/skupper-router-7976948d9f-b58wn            1/1     Running   0          2m50s

NAME                         TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)                           AGE
service/gossiprouter         ClusterIP      10.27.252.196   <none>           8787/TCP,9000/TCP,12001/TCP       4m6s
service/kubernetes           ClusterIP      10.27.240.1     <none>           443/TCP                           27m
service/skupper-controller   LoadBalancer   10.27.241.112   35.223.80.171    8080:30508/TCP                    2m49s
service/skupper-internal     LoadBalancer   10.27.243.17    35.192.126.100   55671:30671/TCP,45671:31522/TCP   2m48s
service/skupper-messaging    ClusterIP      10.27.247.95    <none>           5671/TCP                          2m49s


Next, we create a connection token in one of the clouds. This creates a file containing a certificate and keys that allows a Skupper instance in one cluster to connect to a Skupper instance in another cluster.

Note that this file must be kept secret as it contains the private keys of the (server) Skupper instance!

We only need to connect from one cloud to the other, Skupper will automatically create a bi-directional connection.

Let's pick the public cloud (shell 1):
skupper connection-token gke.secret
Connection token written to gke.secret
 

We now need to copy this file to the other (local) cloud. In my example, I'm using the home directory, but in real-life, this would have to be done secretly.

The local Skupper instance now uses this file to connect to the Skupper instance in the public cluster and establish an encrypted VPN tunnel:
kupper connect gke.secret
Skupper is now configured to connect to 35.192.126.100:55671 (name=conn1)


Now, we have to expose the GossipRouter service in each cloud to Skupper, so Skupper can create a local proxy of the service that transparently connects to the other cloud, via  a symbolic name:
Shell 1:
skupper expose deployment gossiprouter --port 12001 --address gossiprouter-1
Shell 2:
skupper expose deployment gossiprouter --port 12001 --address gossiprouter-2

The symbolic names gossiprouter-1 and gossiprouter-2 are now available to any pod in both clusters.
Traffic sent from the local cluster to gossiprouter-1 in the public cluster is transparently (and encryptedly) forwarded by Skupper between the sites!

This means, we can set TUNNEL_INITIAL_HOSTS (as used in the bridge cluster) to
gossiprouter1[12001],gossiprouter-2[12001].

This is used in bridge.xml:
<TUNNEL bind_addr="match-interface:eth0,site-local"        gossip_router_hosts="${TUNNEL_INITIAL_HOSTS:127.0.0.1[12001]}"
...


Let's now run RelayDemo in the public and local clusters. This is the same procedure as in [1].

Shell 1: 
curl https://raw.githubusercontent.com/belaban/jgroups-docker/master/yaml/nyc.yaml > public.yaml

Shell 2:
curl https://raw.githubusercontent.com/belaban/jgroups-docker/master/yaml/sfc.yaml > local.yaml

In both YAML files, change the number of replicas to 3 and the value of TUNNEL_INITIAL_HOSTS to "gossiprouter-1[12001],gossiprouter-2[12001]".

Then start 3 pods in the public (NYC) and local (SFC) clusters:
Shell 1:
kubectl apply -f public.yaml
deployment.apps/nyc created
service/nyc created


Shell 2:
kubectl apply -f local.yaml
deployment.apps/sfc created
service/sfc created


Verify that there are 3 pods running in each cluster.

Let's now run RelayDemo on the local cluster:
Shell 2: 
> kubectl get pods |grep sfc-
sfc-7f448b7c94-6pb9m         1/1     Running   0          2m44s
sfc-7f448b7c94-d7zkp         1/1     Running   0          2m44s
sfc-7f448b7c94-ddrhs         1/1     Running   0          2m44s


> kubectl exec -it sfc-7f448b7c94-6pb9m bash
bash-4.4$ relay.sh -props sfc.xml -name Local

-------------------------------------------------------------------
GMS: address=Local, cluster=RelayDemo, physical address=10.1.0.88:7801
-------------------------------------------------------------------
View: [sfc-7f448b7c94-6pb9m-4056|3]: sfc-7f448b7c94-6pb9m-4056, sfc-7f448b7c94-ddrhs-52643, sfc-7f448b7c94-d7zkp-11827, Local
: hello
: << hello from Local
<< response from sfc-7f448b7c94-6pb9m-4056
<< response from sfc-7f448b7c94-ddrhs-52643
<< response from sfc-7f448b7c94-d7zkp-11827
<< response from Local
<< response from nyc-6b4846f777-g2gqk-7743:nyc
<< response from nyc-6b4846f777-7jm9s-23105:nyc
<< response from nyc-6b4846f777-q2wrl-38225:nyc



We're first listing all pods, then exec into one of them.

Next, we're running RelayDemo and send a message to all members of the local and remote clusters. We can see that we get a response from self (Local) and the other 3 members of the local (SFC) cluster, and we also get responses from the 3 members of the remote public cluster (NYC).

JGroups load-balances messages across one of the two GossipRouters. Each time, the router is remote, Skupper forwards the traffic transparently over its VPN tunnel to the other site.


[1] http://belaban.blogspot.com/2019/12/spanning-jgroups-kubernetes-based.html
[2] https://skupper.io/


Tuesday, January 28, 2020

First alpha of JGroups 5.0

Howdy folks!

Today I'm very happy to announce the first alpha version of JGroups 5.0!

JGroups 5.0 has major API changes and I'd like people to try it out and give feedback before we release final.

Note that there might still be more API changes before the first beta.

So what's new in 5?

The biggest change is that Message is now an interface [1] and we have a number of message classes implementing it, e.g.:
  • BytesMessage: this is the replacement for the old 4.x Message class, having a byte array as payload.
  • ObjectMessage: accepts an object as payload.
  • NioMessage: has an NIO ByteBuffer as payload.
  • EmptyMessage: this class has *no* payload at all! Useful when sending around messages that have only headers, e.g. heartbeats. Used mainly by JGroups internally. This class has a smaller memory footprint.
  • CompositeMessage: message type which carries other messages
The advantage is different message types is that rather than having to marshal payloads into a byte array, as in 4.x Messages, the payload is now added to the message without marshalling. Marshalling is only done just before sending the message on the network.

This late marshalling saves one memory allocation.

The other advantage is that applications can register their own messages types. This means that we can control how a message is created, e.g. using off-heap memory rather than heap memory.

Other changes include:
  • I've removed a lot of deprecated cruft, e.g. several AuthToken implementations, SASL, S3_PING and GOOGLE_PING (they have better replacements).
  • Java 11 is now the baseline. The current Alpha1 still runs under Java 8, but I expect this to change, perhaps only with 5.1. But at least, I reserve the right to use Java 11 specific language features, so be warned :-)
The full list of 5.0 is here: [2].

I still have a few JIRAs to resolve before releasing 5.0.0.Final, and then I'll add new functionality (without API changes) in a bunch of minor releases. I've planned 5.1 - 5.3 so far.

The documentation is here: [3].

For feedback please use the mailing list [4].

Enjoy!



[1] http://www.jgroups.org/manual5/index.html#Message
[2] https://issues.redhat.com/projects/JGRP/versions/12334686
[3] http://www.jgroups.org/manual5/index.html
[4] http://groups.google.com/forum/#!forum/jgroups-dev