Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Currently, we found this error if the #StripeReader.readStripe() have more than one block read failed.
We use the EC policy ec(6+3) in our cluster.
Caused by: org.apache.hadoop.HadoopIllegalArgumentException: No enough valid inputs are provided, not recoverable at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkInputBuffers(ByteBufferDecodingState.java:119) at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.<init>(ByteBufferDecodingState.java:47) at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86) at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170) at org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:462) at org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94) at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:406) at org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:327) at org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:420) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:892) at java.base/java.io.DataInputStream.read(DataInputStream.java:149) at java.base/java.io.DataInputStream.read(DataInputStream.java:149)
while (!futures.isEmpty()) { try { StripingChunkReadResult r = StripedBlockUtil .getNextCompletedStripedRead(service, futures, 0); dfsStripedInputStream.updateReadStats(r.getReadStats()); DFSClient.LOG.debug("Read task returned: {}, for stripe {}", r, alignedStripe); StripingChunk returnedChunk = alignedStripe.chunks[r.index]; Preconditions.checkNotNull(returnedChunk); Preconditions.checkState(returnedChunk.state == StripingChunk.PENDING); if (r.state == StripingChunkReadResult.SUCCESSFUL) { returnedChunk.state = StripingChunk.FETCHED; alignedStripe.fetchedChunksNum++; updateState4SuccessRead(r); if (alignedStripe.fetchedChunksNum == dataBlkNum) { clearFutures(); break; } } else { returnedChunk.state = StripingChunk.MISSING; // close the corresponding reader dfsStripedInputStream.closeReader(readerInfos[r.index]); final int missing = alignedStripe.missingChunksNum; alignedStripe.missingChunksNum++; checkMissingBlocks(); readDataForDecoding(); readParityChunks(alignedStripe.missingChunksNum - missing); }
This error can be trigger by #StatefulStripeReader.decode.
The reason is that:
- If there are more than one data block read failed, the #readDataForDecoding will be called multiple times;
- The decodeInputs array will be initialized repeatedly.
- The parity data in decodeInputs array which filled by #readParityChunks previously will be set to null.
Attachments
Issue Links
- links to