Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.14.0
-
None
-
None
Description
DESCRIPTION
Spark will send a listStatus RPC to get hadoop file status and read the parquet file footer before reading the parquet file. And send a same listStatus RPC to get the same hadoop file status and read the footer again in ParquetRecordReader. We can reuse the file status and the footer.
PLANS
Save the hadoop file status in the ParquetMetadata and save the ParquetMetadata in the input split, so we can reuse them when init a new ParquetRecordReader.
Attachments
Issue Links
- links to