Description
We should use warnings properly per https://docs.python.org/3/library/warnings.html#warning-categories
In particular,
- we should use FutureWarning instead of DeprecationWarning for the places we should show the warnings to end-users by default.
- we should _maybe_ think about customizing stacklevel (https://docs.python.org/3/library/warnings.html#warnings.warn) like pandas does.
- ...
Current warnings are a bit messy and somewhat arbitrary.
To be more explicit, we'll have to fix:
pyspark/context.py: warnings.warn( pyspark/context.py: warnings.warn( pyspark/ml/classification.py: warnings.warn("weightCol is ignored, " pyspark/ml/clustering.py: warnings.warn("Deprecated in 3.0.0. It will be removed in future versions. Use " pyspark/mllib/classification.py: warnings.warn( pyspark/mllib/feature.py: warnings.warn("Both withMean and withStd are false. The model does nothing.") pyspark/mllib/regression.py: warnings.warn( pyspark/mllib/regression.py: warnings.warn( pyspark/mllib/regression.py: warnings.warn( pyspark/rdd.py: warnings.warn("mapPartitionsWithSplit is deprecated; " pyspark/rdd.py: warnings.warn( pyspark/shell.py: warnings.warn("Failed to initialize Spark session.") pyspark/shuffle.py: warnings.warn("Please install psutil to have better " pyspark/sql/catalog.py: warnings.warn( pyspark/sql/catalog.py: warnings.warn( pyspark/sql/column.py: warnings.warn( pyspark/sql/column.py: warnings.warn( pyspark/sql/context.py: warnings.warn( pyspark/sql/context.py: warnings.warn( pyspark/sql/context.py: warnings.warn( pyspark/sql/context.py: warnings.warn( pyspark/sql/context.py: warnings.warn( pyspark/sql/dataframe.py: warnings.warn( pyspark/sql/dataframe.py: warnings.warn("to_replace is a dict and value is not None. value will be ignored.") pyspark/sql/functions.py: warnings.warn("Deprecated in 2.1, use degrees instead.", DeprecationWarning) pyspark/sql/functions.py: warnings.warn("Deprecated in 2.1, use radians instead.", DeprecationWarning) pyspark/sql/functions.py: warnings.warn("Deprecated in 2.1, use approx_count_distinct instead.", DeprecationWarning) pyspark/sql/pandas/conversion.py: warnings.warn(msg) pyspark/sql/pandas/conversion.py: warnings.warn(msg) pyspark/sql/pandas/conversion.py: warnings.warn(msg) pyspark/sql/pandas/conversion.py: warnings.warn(msg) pyspark/sql/pandas/conversion.py: warnings.warn(msg) pyspark/sql/pandas/functions.py: warnings.warn( pyspark/sql/pandas/group_ops.py: warnings.warn( pyspark/sql/session.py: warnings.warn("Fall back to non-hive support because failing to access HiveConf, "
PySpark prints warnings via using print in some places as well. We should also see if we should switch and replace to warnings.warn.