Summary
JCQueue stores its per-producer-thread batching inserters in instance-field ThreadLocals:
private final ThreadLocal<BatchInserter> thdLocalBatcher = new ThreadLocal<BatchInserter>();
private final ThreadLocal<DynamicBatchInserter> thdLocalDynamicBatcher = new ThreadLocal<DynamicBatchInserter>();
Each inserter holds a strong reference back to its owning JCQueue (this.queue). This forms a reference cycle that prevents the JCQueue from being garbage collected for as long as any producer thread that ever published to it stays alive -- even after every other reference to the queue is dropped.
Mechanism
ThreadLocalMap holds its keys weakly but its values strongly. The cycle defeats the weak key:
Thread
-> ThreadLocalMap
- key: ThreadLocal (weak ref) [also strongly held as a JCQueue field]
- value: DynamicBatchInserter (strong)
-> queue: JCQueue (strong)
-> thdLocalDynamicBatcher: ThreadLocal (strong field)
The key is therefore strongly reachable via: value -> queue -> field
The normal cleanup path for an instance-field ThreadLocal -- the key becoming weakly-reachable once the owning object dies, which then lets the entry be expunged -- never triggers, because value -> queue -> field keeps the key strongly reachable. As long as the producer thread lives, the JCQueue (and its metrics, batch buffers, etc.) cannot be collected.
Impact
- Production workers: effectively no impact. Storm runs each worker as its own JVM process; producer threads and the
JCQueue share a lifetime and both die when the worker process exits. The retained memory is reclaimed by process teardown.
- Shared/static thread pools: real leak. Where producer threads outlive individual queues -- e.g.
LocalCluster, embedded deployments, or test harnesses that repeatedly start/stop topologies on long-lived threads -- each killed topology can strand its JCQueue instances on those threads' ThreadLocalMaps. Over many start/stop cycles this accumulates.
Mitigations that do NOT work
Two seemingly obvious fixes are ineffective and should be avoided:
-
static ThreadLocal<WeakHashMap<JCQueue, Inserter>> (composite weak key).
The value (Inserter) strongly references the JCQueue that is the map key. This is the classic WeakHashMap value-references-key leak: the key never becomes weakly-reachable, so it is never evicted. The cycle is relocated, not broken.
-
Cleanup in JCQueue.close() via ThreadLocal.remove().
close() runs on whatever thread invokes it (typically the consumer/cleanup thread). ThreadLocal.remove() only clears the calling thread's entry. There is no API to clear other producer threads' ThreadLocalMap entries -- which are exactly the ones retaining the queue. So this is cosmetic for the actual leak.
Summary
JCQueuestores its per-producer-thread batching inserters in instance-fieldThreadLocals:Each inserter holds a strong reference back to its owning
JCQueue(this.queue). This forms a reference cycle that prevents theJCQueuefrom being garbage collected for as long as any producer thread that ever published to it stays alive -- even after every other reference to the queue is dropped.Mechanism
ThreadLocalMapholds its keys weakly but its values strongly. The cycle defeats the weak key:The normal cleanup path for an instance-field
ThreadLocal-- the key becoming weakly-reachable once the owning object dies, which then lets the entry be expunged -- never triggers, becausevalue -> queue -> fieldkeeps the key strongly reachable. As long as the producer thread lives, theJCQueue(and its metrics, batch buffers, etc.) cannot be collected.Impact
JCQueueshare a lifetime and both die when the worker process exits. The retained memory is reclaimed by process teardown.LocalCluster, embedded deployments, or test harnesses that repeatedly start/stop topologies on long-lived threads -- each killed topology can strand itsJCQueueinstances on those threads'ThreadLocalMaps. Over many start/stop cycles this accumulates.Mitigations that do NOT work
Two seemingly obvious fixes are ineffective and should be avoided:
static ThreadLocal<WeakHashMap<JCQueue, Inserter>>(composite weak key).The value (
Inserter) strongly references theJCQueuethat is the map key. This is the classic WeakHashMap value-references-key leak: the key never becomes weakly-reachable, so it is never evicted. The cycle is relocated, not broken.Cleanup in
JCQueue.close()viaThreadLocal.remove().close()runs on whatever thread invokes it (typically the consumer/cleanup thread).ThreadLocal.remove()only clears the calling thread's entry. There is no API to clear other producer threads'ThreadLocalMapentries -- which are exactly the ones retaining the queue. So this is cosmetic for the actual leak.