Details
Description
If the etags of blobs were exported via getFileChecksum(), it'd be possible to probe for a blob being in sync with a local file. Distcp could use this to decide whether to skip a file or not.
Now, there's a problem there: distcp needs source and dest filesystems to implement the same algorithm. It'd only work out the box if you were copying between S3 instances. There are also quirks with encryption and multipart: s3 docs. At the very least, it's something which could be used when indexing the FS, to check for changes later.
Attachments
Attachments
Issue Links
- causes
-
HADOOP-15297 Make S3A etag => checksum feature optional
- Resolved
- is depended upon by
-
IMPALA-6057 Cache Remote Reads
- Resolved
- is related to
-
HADOOP-15273 distcp can't handle remote stores with different checksum algorithms
- Resolved