Description
Currently, UDF's type coercion is not cleanly defined. See also https://github.com/apache/spark/pull/20163 and https://github.com/apache/spark/pull/22610
This JIRA targets to describe the type conversion logic internally. For instance:
# +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+ # noqa # |SQL Type \ Pandas Type|True(bool)|1(int8)|1(int16)| 1(int32)| 1(int64)|1(uint8)|1(uint16)|1(uint32)|1(uint64)|a(object)|1970-01-01 00:00:00(datetime64[ns])|1970-01-01 00:00:00-05:00(datetime64[ns, US/Eastern])|1.0(float64)|[1 2 3](object(array))|A(category)|1 days 00:00:00(timedelta64[ns])| # noqa # +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+ # noqa # | boolean| True| True| True| True| True| True| True| True| True| X| False| False| False| X| X| False| # noqa # | tinyint| 1| 1| 1| 1| 1| X| X| X| X| X| X| X| 1| X| 0| X| # noqa # | smallint| 1| 1| 1| 1| 1| 1| X| X| X| X| X| X| 1| X| X| X| # noqa # | int| 1| 1| 1| 1| 1| 1| 1| X| X| X| X| X| 1| X| X| X| # noqa # | bigint| 1| 1| 1| 1| 1| 1| 1| 1| X| X| 0| 18000000000000| 1| X| X| X| # noqa # | string| u''|u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'a'| X| X| u''| X| X| X| # noqa # | date| X| X| X|datetime.date(197...| X| X| X| X| X| X| datetime.date(197...| X| X| X| X| X| # noqa # | timestamp| X| X| X| X|datetime.datetime...| X| X| X| X| X| datetime.datetime...| datetime.datetime...| X| X| X| X| # noqa # | float| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| X| X| X| 1.0| X| X| X| # noqa # | double| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| X| X| X| 1.0| X| X| X| # noqa # | array<int>| X| X| X| X| X| X| X| X| X| X| X| X| X| [1, 2, 3]| X| X| # noqa # | binary| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| # noqa # | decimal(10,0)| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| # noqa # | map<string,int>| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| # noqa # | struct<_1:int>| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| # noqa # +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+ # noqa
Attachments
Issue Links
- is related to
-
SPARK-28132 Update document type conversion for Pandas UDFs (pyarrow 0.13.0, pandas 0.24.2, Python 3.7)
- Resolved
-
SPARK-32722 Update document type conversion for Pandas UDFs (pyarrow 1.0.1, pandas 1.1.1, Python 3.7)
- Resolved
- relates to
-
SPARK-25666 Internally document type conversion between Python data and SQL types in UDFs
- Resolved
- links to