Details
Description
Capable of registering grouped aggregate UDsF and then use it in SQL statement.
For example,
from pyspark.sql.functions import pandas_udf, PandasUDFType @pandas_udf("integer", PandasUDFType.GROUPED_AGG) # doctest: +SKIP def sum_udf(v): return v.sum() spark.udf.register("sum_udf", sum_udf) # doctest: +SKIP q = "SELECT sum_udf(v1) FROM VALUES (3, 0), (2, 0), (1, 1) tbl(v1, v2) GROUP BY v2" spark.sql(q).show() +-----------+ |sum_udf(v1)| +-----------+ | 1| | 5| +-----------+