Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.0
-
None
Description
Currently Distcp uploads a file by two strategies
- append parts
- copy to temp then rename
option 2 executes the following sequence in promoteTmpToTarget
if ((fs.exists(target) && !fs.delete(target, false)) || (!fs.exists(target.getParent()) && !fs.mkdirs(target.getParent())) || !fs.rename(tmpTarget, target)) { throw new IOException("Failed to promote tmp-file:" + tmpTarget + " to: " + target); }
For any object store, that's a lot of HTTP requests; for S3A you are looking at 12+ requests and an O(data) copy call.
This is not a good upload strategy for any store which manifests its output atomically at the end of the write().
Proposed: add a switch to write directly to the dest path, which can be supplied as either a conf option (distcp.direct.write = true) or a CLI option (-direct).
Attachments
Attachments
Issue Links
- breaks
-
HADOOP-16096 HADOOP-15281/distcp -Xdirect needs to use commons-logging on 3.1
- Resolved
- is depended upon by
-
HADOOP-10007 distcp / mv is not working on ftp
- Resolved
-
HADOOP-15788 Improve Distcp for long-haul/cloud deployments
- Open
-
HADOOP-15620 Über-jira: S3A phase VI: Hadoop 3.3 features
- Resolved
- is duplicated by
-
HADOOP-15577 Update distcp to use zero-rename s3 committers
- Resolved
-
HADOOP-16047 Avoid expensive rename when DistCp is writing to S3
- Resolved
- is related to
-
HADOOP-13622 `-atomic` should not be supported while using `distcp` command in object file system
- Open
- is required by
-
HADOOP-10007 distcp / mv is not working on ftp
- Resolved
- relates to
-
HADOOP-15209 DistCp to eliminate needless deletion of files under already-deleted directories
- Resolved
- supercedes
-
HADOOP-16260 Allow Distcp to create a new tempTarget file per File
- Resolved
-
HADOOP-12046 Avoid creating "._COPYING_" temporary file when copying file to Swift file system
- Resolved
- links to