1.
|
Implement DataFrame.withColumn(s) |
|
Resolved |
Rui Wang
|
2.
|
Support Collect() in Python client |
|
Resolved |
Rui Wang
|
3.
|
Support Alias for every Relation |
|
Resolved |
Rui Wang
|
4.
|
SELECT * shouldn't be empty project list in proto. |
|
Resolved |
Rui Wang
|
5.
|
Refactor server side tests to only use DataFrame API |
|
Resolved |
Rui Wang
|
6.
|
Initial DSL framework for protobuf testing |
|
Resolved |
Rui Wang
|
7.
|
Implement `DataFrame.fillna ` and `DataFrame.na.fill ` |
|
Resolved |
Ruifeng Zheng
|
8.
|
Python: rename LogicalPlan.collect to LogicalPlan.to_proto |
|
Resolved |
Rui Wang
|
9.
|
Input relation can be optional for Project in Connect proto |
|
Resolved |
Rui Wang
|
10.
|
[Python] Implement `DataFrame.sample` |
|
Resolved |
Ruifeng Zheng
|
11.
|
Support Repartition in Connect DSL |
|
Resolved |
Rui Wang
|
12.
|
Implement `DataFrame.approxQuantile` and `DataFrame.stat.approxQuantile` |
|
Resolved |
Ruifeng Zheng
|
13.
|
Support CreateView in Connect DSL |
|
Resolved |
Rui Wang
|
14.
|
Implement `DataFrame.SelectExpr` in Python client |
|
Resolved |
Rui Wang
|
15.
|
Add Deduplicate to Connect proto |
|
Resolved |
Rui Wang
|
16.
|
Implement `DataFrame.summary` |
|
Resolved |
Ruifeng Zheng
|
17.
|
Show detailed differences in dataframe comparison |
|
Resolved |
Ruifeng Zheng
|
18.
|
Add Intersect to Connect proto and DSL |
|
Resolved |
Unassigned
|
19.
|
Reimplement df.stat.{cov, corr} with built-in sql functions |
|
Resolved |
Ruifeng Zheng
|
20.
|
Reimplement `frequentItems` with dataframe operations |
|
Resolved |
Ruifeng Zheng
|
21.
|
DataFrame `withColumnsRenamed` can be implemented through `RenameColumns` proto |
|
Resolved |
Rui Wang
|
22.
|
Implement `DataFrame.stat.cov` |
|
Resolved |
Ruifeng Zheng
|
23.
|
Add .agg() to Connect DSL |
|
Resolved |
Rui Wang
|
24.
|
Add groupby to connect DSL and test more than one grouping expressions |
|
Resolved |
Rui Wang
|
25.
|
Support toDF(columnNames) in Connect DSL |
|
Resolved |
Rui Wang
|
26.
|
Compatible `take`, `head` and `first` API in Python client |
|
Resolved |
Rui Wang
|
27.
|
Improve SET operation support in the proto and the server |
|
Resolved |
Rui Wang
|
28.
|
Reimplement `crosstab` with dataframe operations |
|
Resolved |
Ruifeng Zheng
|
29.
|
Implement DataFrame.CreateGlobalView in Python client |
|
Resolved |
Rui Wang
|
30.
|
Implement `DataFrame.sparkSession` in Python client |
|
Resolved |
Rui Wang
|
31.
|
Update relations.proto to follow Connect Proto development guidance |
|
Resolved |
Rui Wang
|
32.
|
Throw exception for Collect() and recommend to use toPandas() |
|
Resolved |
Rui Wang
|
33.
|
Complete Support for Except and Intersect in Python client |
|
Resolved |
Rui Wang
|
34.
|
Implement `DataFrame.dropna ` and `DataFrame.na.drop ` |
|
Resolved |
Ruifeng Zheng
|
35.
|
Add WHERE to Connect proto and DSL |
|
Resolved |
Rui Wang
|
36.
|
Add as(alias: String) to connect DSL |
|
Resolved |
Rui Wang
|
37.
|
Add a dedicated logical plan for `Summary` |
|
Resolved |
Ruifeng Zheng
|
38.
|
`columns` API should use `schema` API to avoid data fetching |
|
Resolved |
Rui Wang
|
39.
|
Support SelectExpr which apply Projection by expressions in Strings in Connect DSL |
|
Resolved |
Rui Wang
|
40.
|
Implement `DataFrame.stat.corr` |
|
Resolved |
Ruifeng Zheng
|
41.
|
Implement DataFrame cross join |
|
Resolved |
Xinrong Meng
|
42.
|
Explain API can support different modes |
|
Resolved |
Rui Wang
|
43.
|
Support Join UsingColumns in proto |
|
Resolved |
Rui Wang
|
44.
|
Remove `str` from Aggregate expression type |
|
Resolved |
Rui Wang
|
45.
|
Implement `DataFrame.sortWithinPartitions` |
|
Resolved |
Ruifeng Zheng
|
46.
|
Implement `DataFrame.show` |
|
Resolved |
Ruifeng Zheng
|
47.
|
Support List[Column] for Join's on argument. |
|
Resolved |
Rui Wang
|
48.
|
Add limit and offset to Connect DSL |
|
Resolved |
Rui Wang
|
49.
|
Add Sample to proto and DSL |
|
Resolved |
Rui Wang
|
50.
|
Implement `DataFrame.__repr__` and `DataFrame.dtypes` |
|
Resolved |
Ruifeng Zheng
|
51.
|
Implement `DataFrame.isEmpty` |
|
Resolved |
Ruifeng Zheng
|
52.
|
Connect Proto should carry unparsed identifiers |
|
Resolved |
Rui Wang
|
53.
|
Reimplement `summary` with dataframe operations |
|
Resolved |
Ruifeng Zheng
|
54.
|
Implement `DataFrame.crosstab` and `DataFrame.stat.crosstab` |
|
Resolved |
Ruifeng Zheng
|
55.
|
DataFrame.to_pandas should not return optional pandas dataframe |
|
Resolved |
Rui Wang
|
56.
|
Improve `on` in Join in Python client |
|
Resolved |
Rui Wang
|
57.
|
Add missing `limit(n)` in DataFrame.head |
|
Resolved |
Ruifeng Zheng
|
58.
|
Complete Support for Union in Python client |
|
Resolved |
Rui Wang
|
59.
|
Implement `DataFrame.drop` |
|
Resolved |
Ruifeng Zheng
|
60.
|
Extend support for Join Relation |
|
Resolved |
Rui Wang
|
61.
|
Dataframe.transform in Python client support |
|
Resolved |
Martin Grund
|
62.
|
StructType should contain a list of StructField and each field should have a name |
|
Resolved |
Rui Wang
|
63.
|
AnalyzeResult should use struct for schema |
|
Resolved |
Rui Wang
|
64.
|
Change default serialization from 'broken' CSV to Spark DF JSON |
|
Resolved |
Martin Grund
|
65.
|
Imports more from connect proto package to avoid calling `proto.` for Connect DSL |
|
Resolved |
Rui Wang
|
66.
|
Support other data type conversion in the DataTypeProtoConverter |
|
Resolved |
Unassigned
|
67.
|
Adopt `optional` keyword from proto3 which offers `hasXXX` to differentiate if a field is set or unset |
|
Resolved |
Rui Wang
|
68.
|
Add ClientType to proto to indicate which client sends a request |
|
Resolved |
Rui Wang
|
69.
|
Make AnalyzePlan support multiple analysis tasks |
|
Resolved |
Ruifeng Zheng
|
70.
|
Removing unused code in connect |
|
Resolved |
Deng Ziming
|
71.
|
`DataFrame.explain` should print and return None |
|
Resolved |
Ruifeng Zheng
|
72.
|
Support string sql expressions in DF.where() |
|
Resolved |
Martin Grund
|
73.
|
Add missing docs for DataFrame API |
|
Resolved |
Rui Wang
|
74.
|
Improve `DataFrame.count()` |
|
Resolved |
Rui Wang
|
75.
|
Implement DataFrame.toDF |
|
Resolved |
Rui Wang
|
76.
|
Implement DataFrame.withColumnRenamed |
|
Resolved |
Rui Wang
|
77.
|
Implement `DataFrame.replace ` and `DataFrame.na.replace ` |
|
Resolved |
Ruifeng Zheng
|
78.
|
Add missing avg() to DF group |
|
Resolved |
Martin Grund
|
79.
|
Bug in Deduplicate Python transformation |
|
Resolved |
Martin Grund
|
80.
|
Improve Documentation for Take,Tail, Limit and Offset |
|
Resolved |
Rui Wang
|
81.
|
Add orderBy and drop_duplicates |
|
Resolved |
Ruifeng Zheng
|
82.
|
Make `Groupby.{min, max, sum, avg, mean}` compatible with PySpark |
|
Resolved |
Ruifeng Zheng
|
83.
|
Implement `DataFrame.hint` |
|
Resolved |
Deng Ziming
|
84.
|
Implement `DataFrame.repartitionByRange` |
|
Resolved |
Deng Ziming
|
85.
|
DF.groupby.agg() API should be compatible |
|
Resolved |
Martin Grund
|
86.
|
Support DataFrame TempView |
|
Resolved |
Rui Wang
|
87.
|
Implement `DataFrame.cube` |
|
Resolved |
Ruifeng Zheng
|
88.
|
Should use SQLExpression for str arguments in Projection |
|
Resolved |
Unassigned
|
89.
|
Implement DataFrame.describe |
|
Resolved |
Jiaan Geng
|
90.
|
Implement DataFrame. colRegex |
|
Resolved |
Ruifeng Zheng
|
91.
|
Implement `DataFrame.melt` and `DataFrame.unpivot` |
|
Resolved |
Ruifeng Zheng
|
92.
|
Implement DataFrame.randomSplit |
|
Resolved |
Jiaan Geng
|
93.
|
Implement DataFrame.subtract |
|
Resolved |
Jiaan Geng
|
94.
|
Implement DataFrame.to |
|
Resolved |
Jiaan Geng
|
95.
|
pyspark_types_to_proto_types should supports StructType. |
|
Resolved |
Jiaan Geng
|
96.
|
Factor GroupedData out to group.py |
|
Resolved |
Hyukjin Kwon
|
97.
|
implement `DataFrame.rollup` |
|
Resolved |
Ruifeng Zheng
|
98.
|
Implement `GroupedData.pivot` |
|
Resolved |
Ruifeng Zheng
|
99.
|
pyspark_types_to_proto_types should supports MapType |
|
Resolved |
Jiaan Geng
|
100.
|
Implement the command logic for print and _repr_html_ |
|
Resolved |
Hyukjin Kwon
|
101.
|
pyspark_types_to_proto_types should supports ArrayType |
|
Resolved |
Jiaan Geng
|
102.
|
Implement `GroupedData.{min, max, avg, sum}` |
|
Resolved |
Ruifeng Zheng
|
103.
|
Support multiple arguments in groupBy.max(...) |
|
Resolved |
Hyukjin Kwon
|
104.
|
Support multiple arguments in groupBy.avg(...) |
|
Resolved |
Hyukjin Kwon
|
105.
|
Support multiple arguments in groupBy.min(...) |
|
Resolved |
Hyukjin Kwon
|
106.
|
Support multiple arguments in groupBy.sum(...) |
|
Resolved |
Apache Spark
|
107.
|
Implement `DataFrame.freqItems ` and `DataFrame.stat.freqItems ` |
|
Resolved |
Unassigned
|
108.
|
Implement `DataFrame.sampleBy ` and `DataFrame.stat.sampleBy ` |
|
Resolved |
Ruifeng Zheng
|
109.
|
Support star in groupBy.agg() |
|
Resolved |
Ruifeng Zheng
|
110.
|
groupBy(...).agg(...).sort does not actually sort the output |
|
Resolved |
Martin Grund
|
111.
|
Make getitem support filter and select |
|
Resolved |
Ruifeng Zheng
|
112.
|
Implement `GroupedData.mean` |
|
Resolved |
Ruifeng Zheng
|
113.
|
DataFrame.join creating ambiguous column names |
|
Resolved |
Ruifeng Zheng
|
114.
|
Implement Dataframe.rdd getNumPartitions |
|
Resolved |
Unassigned
|
115.
|
Fix `isnan` function |
|
Resolved |
Ruifeng Zheng
|
116.
|
DataFrame windowspec functions : unresolved columns |
|
Resolved |
Ruifeng Zheng
|
117.
|
DataFrame.show(): 'Column' object is not callable |
|
Resolved |
Ruifeng Zheng
|
118.
|
Fix DataFrame.describe |
|
Resolved |
Jiaan Geng
|
119.
|
DataFrame.collect() output parity with pyspark |
|
Resolved |
Ruifeng Zheng
|
120.
|
DataFrame hint parameter can be str, float or int |
|
Resolved |
Sandeep Singh
|
121.
|
`DataFrame.collect` should handle None/NaN properly |
|
Resolved |
Ruifeng Zheng
|
122.
|
DataFrame.show formatting int as double |
|
Resolved |
Ruifeng Zheng
|
123.
|
Implement Dataframe.sort,sortWithinPartitions Ordering |
|
Resolved |
Ruifeng Zheng
|
124.
|
Fix DataFrame.sample parameters |
|
Resolved |
Sandeep Singh
|
125.
|
DataFrame.groupBy requires all cols be Column or str |
|
Resolved |
Ruifeng Zheng
|
126.
|
DataFrame.transform: Only Column or String can be used for projections |
|
Resolved |
Ruifeng Zheng
|
127.
|
Implement DataFrame.explain format to be similar to PySpark |
|
Resolved |
Jiaan Geng
|
128.
|
DataFrame dropDuplicates should throw error on non list argument |
|
Resolved |
Hyukjin Kwon
|
129.
|
Throw proper errors in Dataset.to() |
|
Resolved |
Jiaan Geng
|
130.
|
Window.rowsBetween should handle `float("-inf")` and `float("+inf")` as argument |
|
Resolved |
Sandeep Singh
|
131.
|
Make StructType support metadata and Implement `DataFrame.withMetadata` |
|
Resolved |
Ruifeng Zheng
|
132.
|
Enable the doctest for `DataFrame.hint` |
|
Resolved |
Ruifeng Zheng
|
133.
|
DataFrame.createDataFrame converting int to bigint |
|
Resolved |
Ruifeng Zheng
|
134.
|
Handle Function `rand() ` |
|
Resolved |
Hyukjin Kwon
|
135.
|
Python: connect client lost column data with pyarrow.Table.to_pylist |
|
Resolved |
Jiaan Geng
|
136.
|
Add `DataFrame.writeTo` to the unsupported list |
|
Resolved |
Ruifeng Zheng
|
137.
|
Add the unsupported list for `GroupedData` |
|
Resolved |
Ruifeng Zheng
|
138.
|
Make `withMetadata` reuse the `withColumns` proto |
|
Resolved |
Ruifeng Zheng
|
139.
|
Function `slice` should handle string in params |
|
Resolved |
Hyukjin Kwon
|
140.
|
Fix Function `nth_value` functions output |
|
Resolved |
Unassigned
|
141.
|
`DataFrame.collect` should support nested types |
|
Resolved |
Apache Spark
|
142.
|
Function `sampleby` return parity |
|
Resolved |
Jiaan Geng
|
143.
|
`DataFrame.intersect` doctest output has different order |
|
Resolved |
Jiaan Geng
|
144.
|
Support DataFrame hint parameter to be list |
|
Resolved |
Ruifeng Zheng
|
145.
|
DataFrame.unionByName output is wrong |
|
Resolved |
Sandeep Singh
|
146.
|
Implement DataFrame `semanticHash` |
|
Resolved |
Unassigned
|
147.
|
Better type errors when passing wrong parameters |
|
In Progress |
Unassigned
|
148.
|
Implement DataFrame.observe |
|
Resolved |
Jiaan Geng
|
149.
|
Parity in Error types between pyspark and connect functions |
|
Resolved |
Sandeep Singh
|
150.
|
Implement DataFrame `sameSemantics` |
|
Resolved |
Unassigned
|
151.
|
Implement DataFrame `toLocalIterator` |
|
Resolved |
Takuya Ueshin
|
152.
|
createDataFrame supports column with map type. |
|
Resolved |
Unassigned
|
153.
|
Decouple plan transformation and validation on server side |
|
Open |
Unassigned
|
154.
|
DataFrame.join: ambiguous column |
|
Resolved |
Ruifeng Zheng
|
155.
|
DataFrame.createOrReplaceGlobalTempView - SparkConnectException: requirement failed |
|
Resolved |
Takuya Ueshin
|
156.
|
DataFrame.createDataFrame datatype conversion error |
|
Resolved |
Ruifeng Zheng
|
157.
|
DataFrame.show() fix map printing |
|
Resolved |
Ruifeng Zheng
|
158.
|
DataFrame mapfield,structlist invalid type |
|
Resolved |
Ruifeng Zheng
|
159.
|
Implement DataFrame `pandas_api` |
|
Resolved |
Sandeep Singh
|
160.
|
DataFrame `toPandas` parity in return types |
|
Resolved |
Hyukjin Kwon
|
161.
|
Support StreamingQueryListener for DataFrame.observe |
|
Resolved |
Jiaan Geng
|
162.
|
Parity in String representation of Column |
|
Resolved |
Hyukjin Kwon
|
163.
|
Parity in String representation of higher_order_function's output |
|
Resolved |
Ruifeng Zheng
|
164.
|
Different exception message in DataFrame.unpivot |
|
Resolved |
Takuya Ueshin
|
165.
|
Fix map_filter and map_zip_with output order |
|
Resolved |
Jiaan Geng
|
166.
|
Factor data conversion `arrow -> rows` out to `conversion.py` |
|
Resolved |
Ruifeng Zheng
|
167.
|
Make `from_arrow_schema` support nested types |
|
Resolved |
Ruifeng Zheng
|
168.
|
Different result in nested lambda function |
|
Resolved |
Ruifeng Zheng
|
169.
|
Failed to test ClientE2ETestSuite with maven |
|
Resolved |
Yang Jie
|
170.
|
DataFrame.createTempView - SparkConnectGrpcException: requirement failed |
|
Resolved |
Takuya Ueshin
|
171.
|
Support left_outer join |
|
Resolved |
Ruifeng Zheng
|
172.
|
Different exception in DataFrame.sample |
|
Resolved |
Takuya Ueshin
|
173.
|
DataFrame.drop should handle duplicated columns properly |
|
Resolved |
Ruifeng Zheng
|
174.
|
Make `DataFrame.select` support `a.*` |
|
Resolved |
Ruifeng Zheng
|
175.
|
Union avoid calling `output` before analysis |
|
Resolved |
Ruifeng Zheng
|
176.
|
Refactor the AnalyzePlan RPC and add `session.version` |
|
Resolved |
Ruifeng Zheng
|
177.
|
Implement DataFrame.registerTempTable |
|
Resolved |
Takuya Ueshin
|
178.
|
Fix toPandas to handle timezone and map types properly. |
|
Resolved |
Takuya Ueshin
|
179.
|
Implement cache, persist, unpersist, and storageLevel |
|
Resolved |
Takuya Ueshin
|
180.
|
make mapInPandas / mapInArrow support "is_barrier" |
|
Resolved |
Weichen Xu
|
181.
|
Fix the comparison the result with Arrow optimization enabled/disabled. |
|
Resolved |
Takuya Ueshin
|
182.
|
Fix createDataFrame from pandas with map type |
|
Resolved |
Takuya Ueshin
|
183.
|
Fix the error message of createDataFrame from np.array(0) |
|
Resolved |
Takuya Ueshin
|
184.
|
Fix test_createDataFrame_with_single_data_type. |
|
Resolved |
Takuya Ueshin
|
185.
|
Fix createDataFrame from pandas to respect session timezone. |
|
Resolved |
Takuya Ueshin
|
186.
|
Fix DataFrame.collect with null struct. |
|
Resolved |
Takuya Ueshin
|
187.
|
Implement eager evaluation. |
|
Resolved |
Takuya Ueshin
|
188.
|
Decouple handle command and send response on server side |
|
Open |
Unassigned
|
189.
|
Implement DataFrame.foreach |
|
Resolved |
Hyukjin Kwon
|
190.
|
Implement DataFrame.foreachPartition |
|
Resolved |
Hyukjin Kwon
|
191.
|
Investigate the behavior difference in self-join |
|
Open |
Unassigned
|