Details
-
Bug
-
Status: Resolved
-
P2
-
Resolution: Fixed
-
2.33.0
-
None
Description
Working example below (Is there no way to paste pre-formatted code into jira?!) (EDIT: I added the appropriate "code" block)
import itertools import csv import io import apache_beam as beam from apache_beam.dataframe.io import read_csv from apache_beam.transforms.sql import SqlTransform def parse_csv(val): deflower_headers(iterator): return itertools.chain([next(iterator).lower()], iterator) return csv.DictReader(lower_headers(io.TextIOWrapper(val.open()))) class BeamTransformBuilder(): def build(self, pipeline): practices = ( pipeline | beam.io.fileio.MatchFiles("data.csv") | beam.io.fileio.ReadMatches() | beam.Reshuffle() | beam.FlatMap(parse_csv) | beam.Map(lambda x: beam.Row(id="test-id")) | SqlTransform(""" SELECT id FROM PCOLLECTION""") ) practices | beam.Map(print) def main(): builder = BeamTransformBuilder() with beam.Pipeline('DirectRunner') as p: builder.build(p) if __name__ == '__main__': main()
Results in the error:
File "/usr/local/lib/python3.9/site-packages/apache_beam/typehints/schemas.py", line 185, in typing_to_runner_api
element_type = typing_to_runner_api(_get_args(type_)[0])
IndexError: tuple index out of range
Tested on Python 3.9.6.
Annoyingly, it is difficult to test this out on other python versions. There's no documentation for how to setup a docker container using DirectRunner and running it locally. There's barely any documentation on what python versions are supported. And using pyenv, and pip install apache-beam requires a lot of other downloads that have conflicts when other versions are already installed.
Attachments
Issue Links
- blocks
-
BEAM-12000 Support Python 3.9 in Apache Beam
- Triage Needed
- links to