Details
Description
pipeline DN1 DN2 DN3
stop DN2
pipeline added node DN4 located at 2nd position
DN1 DN4 DN3
recover RBW
DN4 after recover rbw
2013-04-01 21:02:31,570 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover RBW replica BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1004
2013-04-01 21:02:31,570 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW
getNumBytes() = 134144
getBytesOnDisk() = 134144
getVisibleLength()= 134144
end at chunk (134144/512=262)
DN3 after recover rbw
2013-04-01 21:02:31,575 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover RBW replica BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_10042013-04-01 21:02:31,575 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW
getNumBytes() = 134028
getBytesOnDisk() = 134028
getVisibleLength()= 134028
client send packet after recover pipeline
offset=133632 len=1008
DN4 after flush
2013-04-01 21:02:31,779 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file offset:134640; meta offset:1063
// meta end position should be floor(134640/512)*4 + 7 == 1059, but now it is 1063.
DN3 after flush
2013-04-01 21:02:31,782 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005, type=LAST_IN_PIPELINE, downstreams=0:[]: enqueue Packet(seqno=219, lastPacketInBlock=false, offsetInBlock=134640, ackEnqueueNanoTime=8817026136871545)
2013-04-01 21:02:31,782 DEBUG org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Changing meta file offset of block BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005 from 1055 to 1051
2013-04-01 21:02:31,782 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file offset:134640; meta offset:1059
After checking meta on DN4, I found checksum of chunk 262 is duplicated, but data not.
Later after block was finalized, DN4's scanner detected bad block, and then reported it to NM. NM send a command to delete this block, and replicate this block from other DN in pipeline to satisfy duplication num.
I think this is because in BlockReceiver it skips data bytes already written, but not skips checksum bytes already written. And function adjustCrcFilePosition is only used for last non-completed chunk, but
not for this situation.
Attachments
Attachments
Issue Links
- is duplicated by
-
HDFS-10587 Incorrect offset/length calculation in pipeline recovery causes block corruption
- Resolved
- is related to
-
HDFS-16601 DataTransfer should throw IOException to Client
- Open
- relates to
-
HDFS-10652 Add a unit test for HDFS-4660
- Resolved
-
HDFS-9220 Reading small file (< 512 bytes) that is open for append fails due to incorrect checksum
- Closed