Description
Although you can choose which committer to write dataframes as parquet data via spark.sql.parquet.output.committer.class, you get a class cast exception if this is not a org.apache.parquet.hadoop.ParquetOutputCommitter or subclass.
This is not consistent with the docs in SQLConf, which says
The specified class needs to be a subclass of org.apache.hadoop.mapreduce.OutputCommitter. Typically, it's also a subclass of org.apache.parquet.hadoop.ParquetOutputCommitter.
It is simple to relax ParquetFileFormat's requirements, though if the user has set
parquet.enable.summary-metadata=true, and set a committer which is not a ParquetOutputCommitter, then they won't see the data.
Attachments
Issue Links
- is related to
-
HADOOP-13786 Add S3A committers for zero-rename commits to S3 endpoints
- Resolved
- links to