[SPARK-35337] pandas API on Spark: Separate basic operations into data type based structures - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Umbrella
Status: Resolved
Priority: Major
Resolution: Done
Affects Version/s: 3.2.0
Fix Version/s: None
Component/s: PySpark
Labels:
None

Epic Link:
Project Zen

Description

Currently, the same basic operation of all data types is defined in one function, so it’s difficult to extend the behavior change based on the data types. For example, the binary operation Series + Series behaves differently based on the data type, e.g., just adding for numerical operands, concatenating for string operands, etc. The behavior difference is done by if-else in the function, so it’s messy and difficult to maintain or reuse the logic.

We should provide an infrastructure to manage the differences in these operations.

Please refer to pandas APIs on Spark: Separate basic operations into data type based structures for details.

Attachments

Issue Links

is part of

SPARK-34849 SPIP: Support pandas API layer on PySpark

Resolved

Sub-Tasks

1.	Separate arithmetic operations into data type based structures	Resolved	Xinrong Meng
2.	Support arithmetic operations against bool IndexOpsMixin	Resolved	Xinrong Meng
3.	Introduce BinaryOps for BinaryType	Resolved	Xinrong Meng
4.	Introduce ArrayOps, MapOps and StructOps	Resolved	Xinrong Meng
5.	Introduce BooleanExtensionOps	Resolved	Xinrong Meng
6.	Make the conversion from/to pandas data-type-based for non-ExtensionDtypes	Resolved	Xinrong Meng
7.	Introduce a way to compare series of array for older pandas	Resolved	Xinrong Meng
8.	Complete arithmetic operators involving bool literals, Series, and Index	Resolved	Xinrong Meng
9.	Make astype data-type-based	Resolved	Xinrong Meng
10.	Make the conversion to pandas data-type-based for ExtensionDtypes	Resolved	Xinrong Meng
11.	Support arithmetic operators (+, *) among bool Series/Index	Resolved	Unassigned
12.	Introduce DecimalOps	Resolved	Yikun Jiang
13.	Support creating a Column of numpy literal value in pandas-on-Spark	Resolved	Xinrong Meng
14.	Make unary and comparison operators data-type-based	Resolved	Xinrong Meng
15.	Improve unit tests for data-type-based basic operations	Resolved	Xinrong Meng
16.	Standardize TypeError messages for unsupported basic operations	Resolved	Xinrong Meng
17.	Add BaseTest for DataTypeOps	Resolved	Yikun Jiang
18.	Manage InternalField in DataTypeOps.isnull	Resolved	Takuya Ueshin
19.	Make astype data-type-based for DecimalOps	Resolved	Yikun Jiang
20.	Assume result's index to be disordered in tests with operations on different Series	Resolved	Xinrong Meng
21.	Consolidate tests for data-type-based operations of decimal Series	Resolved	Yikun Jiang
22.	Manage InternalField more in DataTypeOps.	Resolved	Takuya Ueshin

Activity

People

Assignee:: Xinrong Meng

Reporter:: Xinrong Meng

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 07/May/21 16:45

Updated:: 12/Jul/21 19:06

Resolved:: 12/Jul/21 19:06