Tuesday, November 24, 2020

I hate distributed locks!

I hate distributed locks!

Adding distributed locks to JGroups has been the second biggest mistake that I've made (next to MuxRpcDispatcher). Unfortunately, a lot of people are using them in JGroups...

Distributed locks don't work. Or, at least, don't work the way developers expect them to.

TL;DR

  • Assumption that distributed locks have the same semantics as local locks
  • Multiple cluster members in JGroups can hold the same lock when there is a partition
  • The try-lock-finally-unlock pattern is unsuitable for taking locks away from a holder
  • Some scenarios are better handled by using transactions instead
  • Even consensus-based implementations have their share of problems

Distributed locks

The simplest distributed lock implementation is a (fixed) lock server (DLM) which cluster members contact to acquire or release locks. The server may persist locks in a database, so it still knows which members hold locks on a restart. This may be okay for applications that can tolerate unavailability of the lock server and/or the database. If your application is fine with this, then there's no need to read on!

However, the single lock server may quickly become a bottleneck when many members want to acquire or release locks. It is also a single point of failure. This is not the distributed lock implementation I'm talking about (although it works).

What I'm talking about are clustered distributed locks; in today's distributed systems, lock information is typically replicated to all or a subset of the cluster members. This avoids the database bottleneck, but brings with it its own slew of problems.

The biggest problem is that distributed system can have partitions.

Partitions occur when cluster members are separated from each other; when members falsely suspect each other of having crashed, for example because of a long GC cycle, an exhausted thread pool (so heartbeats are dropped), a flaky NIC, or a router dropping packets.

The example I'll use is a cluster {A,B,C} being partitioned into {A} and {B,C}. A suspects and removes members B and C, because it thinks they died, and members B and C remove A for the same reason. Note that B and C can still talk to each other.

 

Distributed locking in JGroups

How does the JGroups lock service [2] handle partitions? Well, it doesn't!

Let's assume A holds lock L before the partition. C wants to acquire L, but has to wait until A releases it, or crashes. Now partition {A} | {B,C} occurs. B becomes coordinator in {B,C} and C is able to acquire L. A remains coordinator in {A}.

This means that both A and C now hold lock L!

This is because the lock service implementation in JGroups favors availability over consistency (AP, see [1] for details).

This may be acceptable for some applications, but I suspect it's not ok for most. Typically, locks are used to make sure only 1 thread in the cluster accesses a shared resource, or performs an action that modifies some shared state. Unless these actions are idempotent, running them multiple times may wreak havoc.

Even worse, what happens when the partition heals? Now A and C are both holding lock L. The Lock interface mandates code like this:

mylock.lock();

try {

    // do some work 

}

finally {

    mylock.unlock();

}

Because both A and C might be doing some work in the try-clause, the following issues arise:

  1. How do you stop one of the two (the member which is not supposed to hold the lock anymore)?
  2. Do you interrupt the thread? 
  3. What happens to the work that has already been done, ie. state changes?

Point #1 shows that the try-lock-finally-unlock pattern is not a good one when it comes to taking locks away (forcefully) from a member. Actually, I'm not sure a good interface exists at all for forcefully removing locks from a member!

Point #3 is about what should be done with the work done until the point of lock removal? Should it be rolled back? Have other threads seen the changes so far?

The try-lock-finally-unlock pattern does not guarantee ACIDity, so perhaps people who are using the lock service in this manner should replace it with distributed transactions, and roll back the transaction on lock removal (a.k.a transaction abort)?


Consensus to the rescue?

Perhaps we should prevent multiple members from being able to acquire the same lock in the first place, instead of taking away locks? How about we use a consensus based system like jgroups-raft [3]? This is a Raft [4] implementation built on top of JGroups.

Consensus means that a change to the system can only be made when the majority of the members agree. In the case of {A,B,C}, at least 2 members have to agree on a change, otherwise the change won't be applied ('committed' in Raft terms).

In terms of CAP [1], this means that consistency (CP) is favored over availability.

We now introduce a compare-and-swap command to acquire a lock, which acquires lock L only if it is null. When a majority of the members is able to commit the command, then all members will record the outcome either as successful or failed.

For example, if "L" is not set, A is able to acquire the lock by calling compareAndSwap("L", null, "A"). This means atomically set "L" to "A" if "L" is null.

When the partition {A} | {B,C} occurs, A still holds the lock but is not able to release it because the compareAndSwap("L", "A", "null") operation will fail as it doesn't get a majority.

In the other partition {B,C}, B may become leader, but nobody will be able to acquire the lock as the compare-and-swap operation will fail because "L" is not null, but set to "A".

This means that members which want to acquire L have to wait until the partition has healed and A releases the lock. Even worse: if A crashed, it would be to be restarted, in order to release the lock! This is not better than using a DistributedLockManager (DLM) described at the top of this blog!

We have to ask ourselves whether locks make sense in a consensus-based system. If we make changes to a shared state, then - instead of using locks - we should use the consensus-based system directly to make changes. After all, consensus will serialize state changes, which is what locks promise.

 

Conclusion

AP (JGroups lock service) and CP (jgroups-raft) lie at the opposite ends of the reliability spectrum, and applications have to choose between them.

Like choosing between pest and cholera, when using the lock service in JGroups, due to the AP properties, we can have multiple holders of the same lock.

With jgroups-raft and its CP properties, only 1 member holds a lock at any given time, but members trying to acquire a lock may potentially be blocked for a long time.


[1] https://en.wikipedia.org/wiki/CAP_theorem

[2] http://www.jgroups.org/manual5/index.html#LockService

[3] http://belaban.github.io/jgroups-raft/

[4] https://raft.github.io/