Details
-
Bug
-
Status: Resolved
-
P2
-
Resolution: Fixed
-
2.29.0, 2.30.0
Description
In the below example, note that we return an empty dataframe for ib.collect(deferred_df), but ib.collect(to_dataframe(rows)) works as expected.
In [1]: import numpy as np ...: import pandas as pd ...: ...: import apache_beam as beam ...: from apache_beam import Create, Map ...: from apache_beam.dataframe.convert import to_dataframe ...: from apache_beam.dataframe.convert import to_pcollection ...: from apache_beam.runners.interactive.interactive_runner import InteractiveRunner ...: import apache_beam.runners.interactive.interactive_beam as ib In [2]: birds = [ ...: { ...: "name": "American crow", ...: "scientific_name": "Corvus brachyrhynchos", ...: "order": "Passeriformes", ...: "family": "Corvidae" ...: }, ...: { ...: "name": "Canada goose", ...: "scientific_name": "Branta canadensis", ...: "order": "Anseriformes", ...: "family": "Anatidae" ...: }, ...: { ...: "name": "mallard", ...: "scientific_name": "Anas platyrhynchos", ...: "order": "Anseriformes", ...: "family": "Anatidae" ...: } ...: ] In [3]: # create an interactive pipeline ...: p = beam.Pipeline(InteractiveRunner()) ...: ...: ...: # create some pipeline data and map it to rows ...: rows = (p | "Create elements" >> Create(birds) ...: | "To rows" >> Map(lambda bird: beam.Row( ...: common_name=bird["name"], ...: scientific_name=bird["scientific_name"], ...: order=bird["order"], ...: family=bird["family"]))) WARNING:apache_beam.runners.interactive.interactive_environment:You have limited Interactive Beam features since your ipython kernel is not connected to any notebook frontend. In [4]: ib.collect(rows) 'Processing...' 'Done.' Out[4]: common_name scientific_name order family 0 American crow Corvus brachyrhynchos Passeriformes Corvidae 1 Canada goose Branta canadensis Anseriformes Anatidae 2 mallard Anas platyrhynchos Anseriformes Anatidae In [5]: ib.collect(to_dataframe(rows)) 'Processing...' 'Done.' Out[5]: common_name scientific_name order family 0 American crow Corvus brachyrhynchos Passeriformes Corvidae 1 Canada goose Branta canadensis Anseriformes Anatidae 2 mallard Anas platyrhynchos Anseriformes Anatidae In [6]: deferred_df = to_dataframe(rows) In [7]: ib.collect(deferred_df) 'Processing...' 'Done.' Out[7]: Empty DataFrame Columns: [] Index: []