Details
-
Umbrella
-
Status: Resolved
-
Major
-
Resolution: Done
-
3.2.0
-
None
-
None
Description
Currently, the same basic operation of all data types is defined in one function, so it’s difficult to extend the behavior change based on the data types. For example, the binary operation Series + Series behaves differently based on the data type, e.g., just adding for numerical operands, concatenating for string operands, etc. The behavior difference is done by if-else in the function, so it’s messy and difficult to maintain or reuse the logic.
We should provide an infrastructure to manage the differences in these operations.
Please refer to pandas APIs on Spark: Separate basic operations into data type based structures for details.
Attachments
Issue Links
- is part of
-
SPARK-34849 SPIP: Support pandas API layer on PySpark
- Resolved