Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
None
-
None
Description
This issue tackles the root cause of the sever data loss that has been reported in OAK-7852:
When a the input stream of a binary value blocks indefinitely on read the flush thread of the segment store get blocked:
"pool-2-thread-1" #15 prio=5 os_prio=31 tid=0x00007fb0f21e3000 nid=0x5f03 waiting on condition [0x000070000a46d000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000076bba62b0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at com.google.common.util.concurrent.Monitor.await(Monitor.java:963) at com.google.common.util.concurrent.Monitor.enterWhen(Monitor.java:402) at org.apache.jackrabbit.oak.segment.SegmentBufferWriterPool.safeEnterWhen(SegmentBufferWriterPool.java:179) at org.apache.jackrabbit.oak.segment.SegmentBufferWriterPool.flush(SegmentBufferWriterPool.java:138) at org.apache.jackrabbit.oak.segment.DefaultSegmentWriter.flush(DefaultSegmentWriter.java:138) at org.apache.jackrabbit.oak.segment.file.FileStore.lambda$doFlush$8(FileStore.java:307) at org.apache.jackrabbit.oak.segment.file.FileStore$$Lambda$22/1345968304.flush(Unknown Source) at org.apache.jackrabbit.oak.segment.file.TarRevisions.doFlush(TarRevisions.java:237) at org.apache.jackrabbit.oak.segment.file.TarRevisions.flush(TarRevisions.java:195) at org.apache.jackrabbit.oak.segment.file.FileStore.doFlush(FileStore.java:306) at org.apache.jackrabbit.oak.segment.file.FileStore.flush(FileStore.java:318)
The condition 0x000070000a46d000 is waiting for the following thread to return its SegmentBufferWriter, which will never happen if InputStream.read(...) does not progress.
"pool-1-thread-1" #14 prio=5 os_prio=31 tid=0x00007fb0f223a800 nid=0x5d03 runnable [0x000070000a369000 ] java.lang.Thread.State: RUNNABLE at com.google.common.io.ByteStreams.read(ByteStreams.java:833) at org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$SegmentWriteOperation.internalWriteStream(DefaultSegmentWriter.java:641) at org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$SegmentWriteOperation.writeStream(DefaultSegmentWriter.java:618) at org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$SegmentWriteOperation.writeBlob(DefaultSegmentWriter.java:577) at org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$SegmentWriteOperation.writeProperty(DefaultSegmentWriter.java:691) at org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$SegmentWriteOperation.writeProperty(DefaultSegmentWriter.java:677) at org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$SegmentWriteOperation.writeNodeUncached(DefaultSegmentWriter.java:900) at org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$SegmentWriteOperation.writeNode(DefaultSegmentWriter.java:799) at org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$SegmentWriteOperation.access$800(DefaultSegmentWriter.java:252) at org.apache.jackrabbit.oak.segment.DefaultSegmentWriter$8.execute(DefaultSegmentWriter.java:240) at org.apache.jackrabbit.oak.segment.SegmentBufferWriterPool.execute(SegmentBufferWriterPool.java:105) at org.apache.jackrabbit.oak.segment.DefaultSegmentWriter.writeNode(DefaultSegmentWriter.java:235) at org.apache.jackrabbit.oak.segment.SegmentWriter.writeNode(SegmentWriter.java:79)
This issue is critical as such a misbehaving input stream causes the flush thread to get stuck preventing transient segments from being flushed and thus causing data loss.