Description
Seems we can check the schema ahead and fall back in toPandas.
Please see this case below:
df = spark.createDataFrame([[{'a': 1}]]) spark.conf.set("spark.sql.execution.arrow.enabled", "false") df.toPandas() spark.conf.set("spark.sql.execution.arrow.enabled", "true") df.toPandas()
...
py4j.protocol.Py4JJavaError: An error occurred while calling o42.collectAsArrowToPython.
...
java.lang.UnsupportedOperationException: Unsupported data type: map<string,bigint>
In case of createDataFrame, we fall back to make this at least working even though the optimisation is disabled.
df = spark.createDataFrame([[{'a': 1}]]) spark.conf.set("spark.sql.execution.arrow.enabled", "false") pdf = df.toPandas() spark.createDataFrame(pdf).show() spark.conf.set("spark.sql.execution.arrow.enabled", "true") spark.createDataFrame(pdf).show()
... ... UserWarning: Arrow will not be used in createDataFrame: Error inferring Arrow type ... +--------+ | _1| +--------+ |[a -> 1]| +--------+
We need to match the behaviours and add a configuration to control the behaviour.
Attachments
Issue Links
- is related to
-
SPARK-23446 Explicitly check supported types in toPandas
- Resolved
- links to