Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
Impala 2.6.0
-
None
Description
If users do not want to skip the staging step on INSERTs to S3, we could allow the table sink to stage the temporary files in HDFS (if available) and make the coordinator move the files to S3 on FinalizeSuccessfulInsert().
This could improve performance in INSERTs to S3 as writes to HDFS are faster than to S3 currently. Currently, when we do not skip the staging step, the sinks write to a temporary loaction in S3 and the coordinator copies over these files to the final location in S3 (as S3 doesn't support the rename() operation). So this would bring down the number of writes to S3 from 2 to 1 per file.