Details
Description
import pandas as pd from pyspark.sql import functions as f @f.pandas_udf("double") def AVG(x: pd.Series) -> float: return x.mean() abc = spark.createDataFrame([(1.0, 5.0, 17.0)], schema=["a", "b", "c"]) abc.agg(AVG("a"), AVG("c")).show() abc.select("c", "a").agg(AVG("a"), AVG("c")).show()
+------+------+ |AVG(a)|AVG(c)| +------+------+ | 1.0| 17.0| +------+------+ +------+------+ |AVG(a)|AVG(c)| +------+------+ | 17.0| 1.0| +------+------+
Both have to be the same.