Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-35641

ParquetSchemaConverter should correctly handle field optionality

    XMLWordPrintableJSON

Details

    Description

      At the moment, ParquetSchemaConverter marks all fields as optional. This is not correct in general and especially when it comes to handling maps. For example, parquet-tools breaks on the Parquet file produced by ParquetRowDataWriterTest#complexTypeTest:

      parquet-tools inspect /var/folders/sc/k3hr87fj4x169rdq9n107whw0000gp/T/junit14646865447948471989/3b328592-7315-48c6-8fa9-38da4048fb4e
      Traceback (most recent call last):
        File "/Users/asorokoumov/.pyenv/versions/3.12.3/bin/parquet-tools", line 8, in <module>
          sys.exit(main())
                   ^^^^^^
        File "/Users/asorokoumov/.pyenv/versions/3.12.3/lib/python3.12/site-packages/parquet_tools/cli.py", line 26, in main
          args.handler(args)
        File "/Users/asorokoumov/.pyenv/versions/3.12.3/lib/python3.12/site-packages/parquet_tools/commands/inspect.py", line 55, in _cli
          _execute_simple(
        File "/Users/asorokoumov/.pyenv/versions/3.12.3/lib/python3.12/site-packages/parquet_tools/commands/inspect.py", line 63, in _execute_simple
          pq_file: pq.ParquetFile = pq.ParquetFile(filename)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^
        File "/Users/asorokoumov/.pyenv/versions/3.12.3/lib/python3.12/site-packages/pyarrow/parquet/core.py", line 317, in __init__
          self.reader.open(
        File "pyarrow/_parquet.pyx", line 1492, in pyarrow._parquet.ParquetReader.open
        File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
      pyarrow.lib.ArrowInvalid: Map keys must be annotated as required.
      

      The correct thing to do is to mark nullable fields as optional, otherwise required.

      Attachments

        Issue Links

          Activity

            People

              asorokoumov Alex Sorokoumov
              asorokoumov Alex Sorokoumov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: