Details
-
Improvement
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
2.9.2
-
None
-
None
Description
Copying blocks in parallel (enabled when blocks per chunk > 0) is a great DistCp improvement that can hugely speed up copying big files.
But its checksum validation is skipped, e.g. in `RetriableFileCopyCommand.java`
if (!source.isSplit()) {
compareCheckSums(sourceFS, source.getPath(), sourceChecksum,
targetFS, targetPath);
}
and this could result in checksum/data mismatch without notifying developers/users (e.g. HADOOP-16049).
I'd like to provide a patch to add the checksum validation.
Attachments
Issue Links
- is a clone of
-
HADOOP-16158 DistCp to support checksum validation when copy blocks in parallel
- Resolved
- relates to
-
HADOOP-15273 distcp can't handle remote stores with different checksum algorithms
- Resolved
-
HADOOP-16158 DistCp to support checksum validation when copy blocks in parallel
- Resolved
- links to