Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
At the moment, ParquetSchemaConverter marks all fields as optional. This is not correct in general and especially when it comes to handling maps. For example, parquet-tools breaks on the Parquet file produced by ParquetRowDataWriterTest#complexTypeTest:
parquet-tools inspect /var/folders/sc/k3hr87fj4x169rdq9n107whw0000gp/T/junit14646865447948471989/3b328592-7315-48c6-8fa9-38da4048fb4e Traceback (most recent call last): File "/Users/asorokoumov/.pyenv/versions/3.12.3/bin/parquet-tools", line 8, in <module> sys.exit(main()) ^^^^^^ File "/Users/asorokoumov/.pyenv/versions/3.12.3/lib/python3.12/site-packages/parquet_tools/cli.py", line 26, in main args.handler(args) File "/Users/asorokoumov/.pyenv/versions/3.12.3/lib/python3.12/site-packages/parquet_tools/commands/inspect.py", line 55, in _cli _execute_simple( File "/Users/asorokoumov/.pyenv/versions/3.12.3/lib/python3.12/site-packages/parquet_tools/commands/inspect.py", line 63, in _execute_simple pq_file: pq.ParquetFile = pq.ParquetFile(filename) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/asorokoumov/.pyenv/versions/3.12.3/lib/python3.12/site-packages/pyarrow/parquet/core.py", line 317, in __init__ self.reader.open( File "pyarrow/_parquet.pyx", line 1492, in pyarrow._parquet.ParquetReader.open File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Map keys must be annotated as required.
The correct thing to do is to mark nullable fields as optional, otherwise required.
Attachments
Issue Links
- links to