Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
0.9.0
-
None
Description
The issue will be described in more details because a) it was enthralling to debug and reproduce b) it might be interesting for others to fully understand c) the probability to discover it is quite low, nonetheless it happened during an aggressive integration test.
The symptoms were a NullPointerException in AbstractSession.close() line 335 and a blocking thread which was supposed to shut down a server instance. After looking at the code line 331 was identified as critical section as it was the only place where a null value could have sneaked into the array.
In order to run into this problem two preconditions have to be met: a) the server is currently shutting down and thus trying to close all channels and sessions b) one or more channels are unregistered at the same time.
What happens in line 331:
// Channel[] channelToClose = channels.values().toArray(new Channel[channels.values().size()]);
In case of unlucky timing the (first) call to channels.values() returns a larger set (whose size is used to initialize the array) than the second call, which is used to actually populate the array, so one or more array elements are null.
This single problem illustrates three issues:
- Atomic treatment of non-atomic operations
- Unguarded access to the channels map through (un)registerChannel() etc.
- Incorrect exception handling (line 340 never gets called due to missing latch.decrementAndGet() calls, so the IoSessionCloser is never executed (which then leads to the blocking thread))
If somehow manageable a patch will be provided.