Java 21 Virtual Threads – Dude, Where's My Lock?

As Netflix's experience shows, using virtual threads introduced in Java 21 can cause unexpected problems. In a new translation from the team Spring AIO We take a deep dive into the unique challenges the Netflix team faced when integrating virtual threads into Spring Boot on Tomcat.

Let's take a look at how virtual threads can impact system performance and stability.


Introduction

Netflix has a rich history of using Java as the primary programming language used to build a huge number of microservices. As we move to a new version of Java, our JVM Ecosystem team looks for new language features that can improve the ergonomics and performance of our systems. recently published article We've detailed how our workload benefits from moving to the next-generation ZGC as the default garbage collector when migrating to Java 21. Virtual threads are another feature we're excited to introduce as part of this migration.

For those of you who are new to virtual streams, they are described as “lightweight threads that radically reduce the amount of effort required to write, maintain, and monitor high-performance multithreaded applications.” Their power lies in their ability to suspend and recover automatically when blocking operations occur, thus freeing up internal operating system threads for reuse by other operations. The use of virtual threads can achieve high performance when they are used in the right context.

In this article, we discuss one of the unusual cases we encountered on the way to introducing virtual threads in Java 21.

Problem

Netflix engineers submitted several independent reports of intermittent timeouts and hangs to the Performance Engineering and JVM Ecosystem teams. Upon closer inspection, we noticed a set of common signs and symptoms. In all cases, the affected applications were running Java 21 with SpringBoot 3 and embedded Tomcat listening to REST endpoints. The instances that were experiencing this issue simply stopped listening to traffic, even though the JVM on them was still running. One clear symptom that characterizes this issue is a steady increase in the number of sockets in the closeWait state, as shown in the graph below:

Collected diagnostics

Sockets remaining in the closeWait state indicate to us that the remote network host has closed the socket, but it has not been closed on the local host, presumably because the application itself has not done so.

Hidden text

That's right, because `close_wait` is the status of a TCP socket in the Linux kernel: the kernel is waiting for us to close the socket, but our process hasn't closed it yet.

Often this can mean that the application is stuck in an abnormal state, in which case application thread dumps can reveal additional important information to us.

To fix this problem, we first used our notification systemto capture the instance when this condition occurred. Since we periodically collect and save thread dumps for all JVM workloads, we can often retrospectively reconstruct behavior by examining these thread dumps from a particular instance. However, we were surprised that all of our thread dumps showed a perfectly idle JVM with no apparent activity. By examining the latest changes, we found that the affected services had virtual threads enabled, and we knew that the virtual thread call stacks did not show up in the thread dumps created with the jstack command. To get a more complete thread dump containing the state of the virtual threads, we used the “jcmd Thread.dump_to_file” command. As a last ditch effort to examine the state of the JVM, we also collected a heap dump from the instance.

Analysis

Thread dumps showed thousands of “empty” virtual threads:

#119821 "" virtual

#119820 "" virtual

#119823 "" virtual

#120847 "" virtual

#119822 "" virtual
...

These are virtual threads for which a thread object has been created, but which have not yet started executing and therefore have no stack trace. In fact, the number of such empty virtual threads was approximately the same as the number of sockets in the closeWait state. To understand what exactly we are seeing, we first need to understand how virtual threads work.

A virtual thread does not map one-to-one to a dedicated thread at the operating system level. Rather, it can be thought of as a task that is scheduled to run on a fork-join thread pool. When a virtual thread gets into a blocking call, such as waiting for a Future to complete, it releases the operating system thread it was occupying and simply sits in memory until it is ready to resume execution. In the meantime, the operating system thread can be reassigned to run other virtual threads in the same fork-join pool. This allows us to multiplex many virtual threads onto a small number of main operating system threads. In JVM terminology, the main operating system thread is called the “carrier thread”, to which a virtual thread can be “mounted” during execution and “unmounted” while waiting. An excellent detailed description of virtual threads is available at JEP 444.

In our environment, we use a blocking model for Tomcat, which essentially holds a worker thread for the duration of a request. When virtual threads are enabled, Tomcat switches to virtual execution. Each incoming request creates a new virtual thread, which is simply scheduled as a task on Virtual Thread Executor. We can see how Tomcat creates a VirtualThreadExecutor Here.

Relating this information to our issue, the symptoms are consistent with a condition where Tomcat keeps creating a new virtual thread to run on the network for each incoming request, but there are no available OS-level threads to mount them to.

Why is Tomcat frozen?

What happened to our OS level threads and what were they doing? described herea virtual thread will be attached to the host OS thread if it performs a blocking operation while inside a synchronized block or method. This is exactly what is happening here. Below is the relevant thread dump fragment obtained from the hung instance.

#119515 "" virtual
      java.base/jdk.internal.misc.Unsafe.park(Native Method)
      java.base/java.lang.VirtualThread.parkOnCarrierThread(VirtualThread.java:661)
      java.base/java.lang.VirtualThread.park(VirtualThread.java:593)
      java.base/java.lang.System$2.parkVirtualThread(System.java:2643)
      java.base/jdk.internal.misc.VirtualThreads.park(VirtualThreads.java:54)
      java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:219)
      java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:754)
      java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:990)
      java.base/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
      java.base/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
      zipkin2.reporter.internal.CountBoundedQueue.offer(CountBoundedQueue.java:54)
      zipkin2.reporter.internal.AsyncReporter$BoundedAsyncReporter.report(AsyncReporter.java:230)
      zipkin2.reporter.brave.AsyncZipkinSpanHandler.end(AsyncZipkinSpanHandler.java:214)
      brave.internal.handler.NoopAwareSpanHandler$CompositeSpanHandler.end(NoopAwareSpanHandler.java:98)
      brave.internal.handler.NoopAwareSpanHandler.end(NoopAwareSpanHandler.java:48)
      brave.internal.recorder.PendingSpans.finish(PendingSpans.java:116)
      brave.RealSpan.finish(RealSpan.java:134)
      brave.RealSpan.finish(RealSpan.java:129)
      io.micrometer.tracing.brave.bridge.BraveSpan.end(BraveSpan.java:117)
      io.micrometer.tracing.annotation.AbstractMethodInvocationProcessor.after(AbstractMethodInvocationProcessor.java:67)
      io.micrometer.tracing.annotation.ImperativeMethodInvocationProcessor.proceedUnderSynchronousSpan(ImperativeMethodInvocationProcessor.java:98)
      io.micrometer.tracing.annotation.ImperativeMethodInvocationProcessor.process(ImperativeMethodInvocationProcessor.java:73)
      io.micrometer.tracing.annotation.SpanAspect.newSpanMethod(SpanAspect.java:59)
      java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
      java.base/java.lang.reflect.Method.invoke(Method.java:580)
      org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:637)
...

In this stack trace we see that synchronization is happening in brave.RealSpan.finish(RealSpan.java:134). This virtual thread is effectively pinned – it is tied to a real OS-level thread, even if it is waiting to acquire a reentrant lock. (Spring AYO team: we will definitely tell you about this separately a little later)There are 3 virtual threads in exactly this state and another virtual thread identified as “ @DefaultExecutor – 46542” which follows the same path inside the code. These 4 virtual threads are pinned waiting for a lock to be acquired. Since the application is deployed on a 4 vCPU instance, the fork-join pool that provides execution of the virtual threads also contains 4 OS threads. Now that we have exhausted them, no other virtual thread can progress further. This explains why Tomcat has stopped processing requests and why the number of sockets in the closeWait state keeps growing. Indeed, Tomcat accepts a connection on the socket, creates a request along with a virtual thread and passes this request/thread to the executor for processing. However, the newly created virtual thread cannot be scheduled because all OS threads in the fork-join pool are pinned and never released. So these newly created virtual threads get stuck in the queue while still holding the socket.

Who has the lock?

Now that we know that virtual threads are waiting to acquire a lock, the next question is: who holds the lock? The answer to this question is key to understanding what caused this state of affairs in the first place. Typically, a thread dump would show who holds a lock using the “- locked <0x…> (at …)” or “Locked ownable synchronizers,” commands, but neither of these commands yield any results in our thread dumps. In fact, no locking/parking/waiting information is included in the thread dumps created by the jcmd command. This is a limitation in Java 21, and will be fixed in a future release. A close inspection of the thread dump reveals that there are a total of six threads contending for the same ReentrantLock and its associated Condition. Four of these six threads are described in the previous section. Here is the contents of another thread:

#119516 "" virtual
      java.base/java.lang.VirtualThread.park(VirtualThread.java:582)
      java.base/java.lang.System$2.parkVirtualThread(System.java:2643)
      java.base/jdk.internal.misc.VirtualThreads.park(VirtualThreads.java:54)
      java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:219)
      java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:754)
      java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:990)
      java.base/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
      java.base/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
      zipkin2.reporter.internal.CountBoundedQueue.offer(CountBoundedQueue.java:54)
      zipkin2.reporter.internal.AsyncReporter$BoundedAsyncReporter.report(AsyncReporter.java:230)
      zipkin2.reporter.brave.AsyncZipkinSpanHandler.end(AsyncZipkinSpanHandler.java:214)
      brave.internal.handler.NoopAwareSpanHandler$CompositeSpanHandler.end(NoopAwareSpanHandler.java:98)
      brave.internal.handler.NoopAwareSpanHandler.end(NoopAwareSpanHandler.java:48)
      brave.internal.recorder.PendingSpans.finish(PendingSpans.java:116)
      brave.RealScopedSpan.finish(RealScopedSpan.java:64)
      ...

Note that while this thread appears to go through the same path in the code to complete the run, it does not go through the synchronized block. Finally, here are the contents of the sixth thread:

#107 "AsyncReporter <redacted>"
      java.base/jdk.internal.misc.Unsafe.park(Native Method)
      java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:221)
      java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:754)
      java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1761)
      zipkin2.reporter.internal.CountBoundedQueue.drainTo(CountBoundedQueue.java:81)
      zipkin2.reporter.internal.AsyncReporter$BoundedAsyncReporter.flush(AsyncReporter.java:241)
      zipkin2.reporter.internal.AsyncReporter$Flusher.run(AsyncReporter.java:352)
      java.base/java.lang.Thread.run(Thread.java:1583)

This is actually a normal platform thread, not a virtual thread. If we pay close attention to the number of lines in this stack trace, we will notice a rather strange thing, namely, the thread seems to be blocked inside the acquire() method (which is internal) after end of waiting. In other words, this calling thread took the lock when it entered awaitNanos(). We know that the lock was explicitly acquired. Here. However, by the time the wait was complete, the thread was unable to reacquire the lock. To summarize our thread dump analysis:

Thread ID/name

Virtual?

“synchronized” block?

Pinned?

Waiting for the lock?

#119513 “”

Yes

Yes

Yes

Yes

#119514 “”

Yes

Yes

Yes

Yes

#119515 “”

Yes

Yes

Yes

Yes

#119517 “ @DefaultExecutor – 46542″

Yes

Yes

Yes

Yes

#119516 “”

Yes

No

No

Yes

#107 “AsyncReporter”

No

No

N/A

Yes

There are five virtual threads and one normal thread waiting to acquire a lock. Of the five virtual threads, four are attached to OS threads in a fork-join pool. There is still no information about who has acquired the lock. Since we can't learn anything else from analyzing the thread dump, the logical next step is to look at the heap dump and analyze the state of the lock.

Lock inspection

Finding a lock inside a heap dump was relatively easy. Using a great tool Eclipse MATwe examined the objects on the stack of the non-virtual AsyncReporter thread to identify the lock object. Thinking about the current state of lock was perhaps the hardest part of our investigation. Much of the relevant code can be found in AbstractQueuedSynchronizer.java. We don't pretend to fully understand the inner workings of this code, but we've disassembled enough to compare with what we see in the heap dump. The diagram below illustrates our findings:

First, the exclusiveOwnerThread field is null (2), meaning that the lock is not owned by anyone. We have an “empty” ExclusiveNode (3) at the top of the list (its waiter field is null and its status is cleared); next comes another ExclusiveNode where waiter points to one of the virtual threads competing for the lock — #119516 (4). The only place where we found the exclusiveOwnerThread field being cleared is inside the ReentrantLock.Sync.tryRelease( (link to source). There we also set state = 0, which corresponds to the state we see in the heap dump (1).

With this in mind, we tracked path in code before the release() method releases the lock. After a successful call to tryRelease(), the thread holding the lock tries send a signal to the next waiter in the list. At this point, the thread holding the lock is still at the head of the list, even though ownership of the lock has effectively been released. The next node in the list points to the thread that is about to acquire the lock.

To understand how this signaling works, let's look at the lock acquisition path in the AbstractQueuedSynchronizer.acquire() method. In a grossly simplified form, this is an infinite loop where threads try to acquire the lock and then park if the attempt fails:

while(true) {
   if (tryAcquire()) {
      return; // lock acquired
   }
   park();
}

When the thread holding the lock releases it and signals the next waiting thread to unpark, the unparked thread goes through the entire loop again, giving it another chance to acquire the lock. Indeed, our thread dump shows that all of our waiting threads are parked, on line 754. After unparking, the thread that was able to obtain the lock should be in in this block of codeessentially resetting the head of the list and clearing the reference to the waiting thread.

In short, the thread that owns the lock is referenced by the head node of the list. Releasing the lock signals the next node in the list, while acquiring the lock resets the head of the list to the current node. This means that what we see in the heap dump reflects a state where one thread has already released the lock, and the next one has yet to acquire it. It's a strange intermediate state that should be transient, but our JVM gets stuck in it. We know that thread #119516 has been notified and should be acquiring the lock right now because of the ExclusiveNode state we identified at the head of the list. However, the thread dumps show that thread #119516 is still waiting, as are other threads competing for the same lock. How do we reconcile what we see in the thread dumps and what we see in the heap dumps?

Lock for which there is no place

Knowing that thread #119516 was indeed notified, we went back to the thread dump to double-check the state of the threads. Recall that we have 6 threads waiting to acquire the lock, of which 4 virtual threads are pinned to OS threads. These 4 threads will not release their OS threads until they acquire the lock and exit the synchronized block. #107 “AsyncReporter ” is a regular platform thread, so nothing should prevent it from continuing to execute if it acquires the lock. There is one last thread left: #119516. It is a virtual thread, but it is not pinned to an OS thread. Even if it was notified to exit the “parked” state, it cannot continue executing because there are no OS threads left in the fork-join pool to assign it to. This is exactly what is happening here – although #119516 has been signaled to exit the “parked” state, it cannot exit that state because the fork-join pool is occupied by four other virtual threads that are also waiting to acquire the same lock. None of these pinned virtual threads can continue execution until they acquire the lock. This is a kind of variation classic deadlock problembut instead of two locks we get one lock and a semaphore with four permissions, which is what the fork-join pool represents.

Now that we know exactly what happened, it was easy to create reproducible test case.

Conclusion

Virtual threads are expected to improve performance by reducing the overhead associated with thread creation and context switching. Despite some rough edges in Java 21, virtual threads generally live up to expectations. In our quest to build more performant Java applications, we see the potential for further adoption of virtual threads as key to achieving this goal. We look forward to Java 23 and beyond, which will bring many updates and hopefully fix the integration issues between virtual threads and blocking primitives.

This study highlights just one type of problem that performance engineers solve at Netflix. We hope that this insight into our approach to problem solving will be useful to others in their future investigations.

Join the Russian-speaking Spring Boot developer community in telegram – Spring AIOto stay up to date with the latest news from the world of Spring Boot development and everything related to it.

We are waiting for everyone, join us

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *