Details
Description
Replicate HDFS-1170 and HADOOP-14365 with an API to open files.
A key requirement of this is not HDFS, it's to put in the fadvise policy for working with object stores, where getting the decision to do a full GET and TCP abort on seek vs smaller GETs is fundamentally different: the wrong option can cost you minutes. S3A and Azure both have adaptive policies now (first backward seek), but they still don't do it that well.
Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" "random" as an option when they open files; I can imagine other options too.
The Builder model of eddyxu is the one to mimic, method for method. Ideally with as much code reuse as possible
Attachments
Attachments
Issue Links
- causes
-
HADOOP-16480 S3 Select Exceptions are not being converted to IOEs
- Open
-
MAPREDUCE-7184 TestJobCounters byte counters omitting crc file bytes read
- Resolved
-
HADOOP-16106 hadoop-aws project javadoc does not compile
- Resolved
- incorporates
-
HADOOP-15364 Add support for S3 Select to S3A
- Resolved
- is depended upon by
-
HADOOP-15963 Add ABFS support for Async Scatter/Gather IO
- Open
-
HADOOP-15364 Add support for S3 Select to S3A
- Resolved
-
HADOOP-15964 Add S3A support for Async Scatter/Gather IO
- Resolved
-
MAPREDUCE-7182 MapReduce input format/record readers to support S3 select queries
- Resolved
-
SPARK-48571 Reduce the number of accesses to S3 object storage
- Open
- is duplicated by
-
HADOOP-19199 Include FileStatus when opening a file from FileSystem
- Resolved
-
HADOOP-19200 Reduce the number of headObject when opening a file with the s3 file system
- Resolved
- is related to
-
HADOOP-11867 Add a high-performance vectored read API.
- Resolved
-
HADOOP-15949 open(PathHandle) spec and implementations ambiguous about what to raise when path is deleted
- Open
-
HDFS-14111 hdfsOpenFile on HDFS causes unnecessary IO from file offset 0
- Resolved
-
HADOOP-18287 Provide a shim library for modern FS APIs
- Open
- is required by
-
HADOOP-15625 S3A input stream to use etags/version number to detect changed source files
- Resolved
- relates to
-
HADOOP-13327 Add OutputStream + Syncable to the Filesystem Specification
- Resolved
-
HADOOP-14365 Stabilise FileSystem builder-based create API
- Resolved
-
HADOOP-15691 Add PathCapabilities to FS and FC to complement StreamCapabilities
- Resolved
-
HDFS-2744 Extend FSDataInputStream to allow fadvise
- Open
-
HDFS-11170 Add builder-based create API to FileSystem
- Resolved
-
HDFS-14478 Add libhdfs APIs for openFile
- Resolved