Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-12803

SqlTransform doesn't work on python 3.9

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • 2.33.0
    • 2.34.0
    • sdk-py-core
    • None

    Description

      Working example below (Is there no way to paste pre-formatted code into jira?!) (EDIT: I added the appropriate "code" block)

      import itertools
      import csv
      import io
      
      import apache_beam as beam
      from apache_beam.dataframe.io import read_csv
      from apache_beam.transforms.sql import SqlTransform
      
      
      def parse_csv(val):
      deflower_headers(iterator):
      return itertools.chain([next(iterator).lower()], iterator)
      return csv.DictReader(lower_headers(io.TextIOWrapper(val.open())))
      
      
      class BeamTransformBuilder():
        def build(self, pipeline):
          practices = (
              pipeline
                | beam.io.fileio.MatchFiles("data.csv")
                | beam.io.fileio.ReadMatches()
                | beam.Reshuffle()
                | beam.FlatMap(parse_csv)
                | beam.Map(lambda x: beam.Row(id="test-id"))
                | SqlTransform("""
                      SELECT
                      id
                      FROM PCOLLECTION""")
              )
          practices | beam.Map(print)
      
      
      def main():
        builder = BeamTransformBuilder()
        with beam.Pipeline('DirectRunner') as p:
        builder.build(p)
      
      
      if __name__ == '__main__':
        main()
      

       
      Results in the error:

       

        File "/usr/local/lib/python3.9/site-packages/apache_beam/typehints/schemas.py", line 185, in typing_to_runner_api
      
          element_type = typing_to_runner_api(_get_args(type_)[0])
      
      IndexError: tuple index out of range
      

       

       

      Tested on Python 3.9.6. 

       

      Annoyingly, it is difficult to test this out on other python versions. There's no documentation for how to setup a docker container using DirectRunner and running it locally. There's barely any documentation on what python versions are supported. And using pyenv, and pip install apache-beam requires a lot of other downloads that have conflicts when other versions are already installed.

      Attachments

        Issue Links

          Activity

            People

              Jonathan Hourany Jonathan Hourany
              steeling sean teeling
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 50m
                  2h 50m