Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41279 Feature parity: DataFrame API in Spark Connect
  3. SPARK-41945

Python: connect client lost column data with pyarrow.Table.to_pylist

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • Connect
    • None

    Description

      Python: connect client should not use pyarrow.Table.to_pylist to transform fetched data.
      For example:
      the data in pyarrow.Table show below.

      pyarrow.Table
      key: string
      order: int64
      nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
      nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
      nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string
      ----
      key: [["a","a","a","a","a","b","b"]]
      order: [[0,1,2,3,4,1,2]]
      nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): [[null,"x","x","x","x",null,null]]
      nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): [[null,"x","x","x","x",null,null]]
      nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): [[null,null,"y","y","y",null,null]]
      

      The table have five columns show above.
      But the data after call pyarrow.Table.to_pylist() show below.

      [{
      	'key': 'a',
      	'order': 0,
      	'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None,
      	'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None
      }, {
      	'key': 'a',
      	'order': 1,
      	'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'x',
      	'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None
      }, {
      	'key': 'a',
      	'order': 2,
      	'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'x',
      	'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'y'
      }, {
      	'key': 'a',
      	'order': 3,
      	'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'x',
      	'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'y'
      }, {
      	'key': 'a',
      	'order': 4,
      	'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'x',
      	'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'y'
      }, {
      	'key': 'b',
      	'order': 1,
      	'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None,
      	'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None
      }, {
      	'key': 'b',
      	'order': 2,
      	'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None,
      	'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None
      }]
      

      There are only four columns left.

      Attachments

        Activity

          People

            beliefer Jiaan Geng
            beliefer Jiaan Geng
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: