[SPARK-22978] Register Scalar Vectorized UDFs for SQL Statement - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.3.0
Component/s: PySpark
Labels:
None

Description

Capable of registering vectorized UDFs and then use it in SQL statement.

For example,

>>> import random
>>> from pyspark.sql.types import IntegerType
>>> from pyspark.sql.functions import pandas_udf
>>> random_pandas_udf = pandas_udf(
...     lambda x: random.randint(0, 100) + x, IntegerType())
...     .asNondeterministic()  # doctest: +SKIP
>>> _ = spark.catalog.registerFunction(
...     "random_pandas_udf", random_pandas_udf, IntegerType())  # doctest: +SKIP
>>> spark.sql("SELECT random_pandas_udf(2)").collect()  # doctest: +SKIP
[Row(random_pandas_udf(2)=84)]

Attachments

Issue Links

links to

[Github] Pull Request #20171 (gatorsmile)

Activity

People

Assignee:: Xiao Li

Reporter:: Xiao Li

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 06/Jan/18 08:16

Updated:: 12/Dec/22 18:10

Resolved:: 16/Jan/18 11:22