[HADOOP-19200] Reduce the number of headObject when opening a file with the s3 file system - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 3.4.0, 3.3.6
Fix Version/s: None
Component/s: fs/s3
Labels:
None

Description

In the implementation of the S3 filesystem, of the hadoop aws package, if you use it with spark, every time you open a file for anything you will have to send two Head Objects, since to open the file, you will first look to see if this file exists, executing a HeadObject, and then when opening it, the implementation, both of sdk1 and sdk2, forces you to make a head object again. This is not the fault of the implementation of this class (S3AFileSystem), but of the abstract FileSystem class of the Hadoop core, since it does not allow the FileStatus to be passed but only allows the use of Path.

If the FileSystem implementation is changed, it could be used to not have to request that HeadObject again.

Attachments

Issue Links

depends upon

SPARK-48571 Reduce the number of accesses to S3 object storage

Open

duplicates

HADOOP-15229 Add FileSystem builder-based openFile() API to match createFile(); S3A to implement S3 Select through this API.

Resolved

is depended upon by

HADOOP-19199 Include FileStatus when opening a file from FileSystem

Resolved

is related to

HADOOP-19199 Include FileStatus when opening a file from FileSystem

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Oliver Caballero Alvarez

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 08/Jun/24 12:20

Updated:: 10/Jun/24 14:07

Resolved:: 10/Jun/24 13:32