[SPARK-23899] Built-in SQL Function Improvement - ASF JIRA

Details

Type: Umbrella
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 2.4.0
Component/s: SQL
Labels:
None

Target Version/s:

2.4.0

Description

This umbrella JIRA is to improve compatibility with the other data processing systems, including Hive, Teradata, Presto, Postgres, MySQL, DB2, Oracle, and MS SQL Server.

Attachments

Issue Links

incorporates

SPARK-24023 Built-in SQL Functions improvement in SparkR

Resolved

is duplicated by

SPARK-19480 Higher order functions in SQL

Resolved

relates to

CALCITE-3679 Allow lambda expressions in SQL queries

Closed

Sub-Tasks

1.	Add support for date extract	Resolved	Yuming Wang
2.	format_number udf should take user specifed format as argument	Resolved	Yuming Wang
3.	Data Masking Functions	Resolved	Marco Gaido
4.	Provide an option in months_between UDF to disable rounding-off	Resolved	Marco Gaido
5.	Add UDF trunc(numeric)	Resolved	Yuming Wang
6.	Add UDF weekday	Resolved	yucai
7.	Support regr_* functions	Resolved	Marco Gaido
8.	High-order function: transform(array<T>, function<T, U>) → array<U>	Resolved	Takuya Ueshin
9.	High-order function: filter(array<T>, function<T, boolean>) → array<T>	Resolved	Takuya Ueshin
10.	High-order function: aggregate(array<T>, initialState S, inputFunction<S, T, S>, outputFunction<S, R>) → R	Resolved	Takuya Ueshin
11.	High-order function: array_distinct(x) → array	Resolved	Huaxin Gao
12.	High-order function: array_intersect(x, y) → array	Resolved	Kazuaki Ishizaki
13.	High-order function: array_union(x, y) → array	Resolved	Kazuaki Ishizaki
14.	High-order function: array_except(x, y) → array	Resolved	Kazuaki Ishizaki
15.	High-order function: array_join(x, delimiter, null_replacement) → varchar	Resolved	Marco Gaido
16.	High-order function: array_max(x) → x	Resolved	Marco Gaido
17.	High-order function: array_min(x) → x	Resolved	Marco Gaido
18.	High-order function: array_position(x, element) → bigint	Resolved	Kazuaki Ishizaki
19.	High-order function: array_remove(x, element) → array	Resolved	Huaxin Gao
20.	High-order function: arrays_overlap(x, y) → boolean	Resolved	Marco Gaido
21.	High-order function: array_sort(x) → array	Resolved	Kazuaki Ishizaki
22.	High-order function: element_at	Resolved	Kazuaki Ishizaki
23.	High-order function: concat(array1, array2, ..., arrayN) → array	Resolved	Marek Novotny
24.	High-order function: flatten(x) → array	Resolved	Marek Novotny
25.	High-order function: repeat(element, count) → array	Resolved	Florent Pepin
26.	High-order function: reverse(x) → array	Resolved	Marek Novotny
27.	High-order function: sequence	Resolved	Alex Vayda
28.	High-order function: shuffle(x) → array	Resolved	Huizhi Lu
29.	High-order function: slice(x, start, length) → array	Resolved	Marco Gaido
30.	High-order function: cardinality(x) → bigint	Resolved	Kazuaki Ishizaki
31.	High-order function: array_zip(array1, array2[, ...]) → array<row>	Resolved	Dylan Guedes
32.	High-order function: zip_with(array<T>, array<U>, function<T, U, R>) → array<R>	Resolved	Sandeep Singh
33.	High-order function: map(array<K>, array<V>) → map<K,V>	Resolved	Kazuaki Ishizaki
34.	High-order function: map_from_entries(array<row<K, V>>) → map<K,V>	Resolved	Marek Novotny
35.	High-order function: map_entries(map<K, V>) → array<row<K,V>>	Resolved	Marek Novotny
36.	High-order function: map_concat(map1<K, V>, map2<K, V>, ..., mapN<K, V>) → map<K,V>	Resolved	Bruce Robbins
37.	High-order function: map_filter(map<K, V>, function<K, V, boolean>) → MAP<K,V>	Resolved	Marco Gaido
38.	High-order function: map_zip_with(map<K, V1>, map<K, V2>, function<K, V1, V2, V3>) → map<K, V3>	Resolved	Marek Novotny
39.	High-order function: transform_keys(map<K1, V>, function<K1, V, K2>) → map<K2,V>	Resolved	Neha Patil
40.	High-order function: transform_values(map<K, V1>, function<K, V1, V2>) → map<K, V2>	Resolved	Neha Patil
41.	High-order function: zip_with_index	Resolved	Unassigned
42.	High-order function: exists(array<T>, function<T, boolean>) → boolean	Resolved	Takuya Ueshin
43.	High-order function: filter(array<T>, function<T, Int, boolean>) → array<T>	Resolved	Henry Davidge

Activity

Ascending order - Click to sort in descending order

Alex Vayda added a comment - 22/May/18 19:55 - edited

What do you guys think about adding another set of convenient functions for working with multi-dimensional arrays? E.g. matrix operations like transpose, multiply and others?
Something similar to ml.linalg.Matrix

Alex Vayda added a comment - 22/May/18 19:55 - edited What do you guys think about adding another set of convenient functions for working with multi-dimensional arrays? E.g. matrix operations like transpose , multiply and others? Something similar to ml.linalg.Matrix

Wenchen Fan added a comment - 10/Sep/18 14:04

I'm resolving it, since there is only one subtask unfinished, which is minor to this entire story.

Wenchen Fan added a comment - 10/Sep/18 14:04 I'm resolving it, since there is only one subtask unfinished, which is minor to this entire story.

Georg Heiler added a comment - 19/Sep/18 07:22

What about repartitioning by complex types, i.e. size of array? https://stackoverflow.com/questions/46240688/how-to-equally-partition-array-data-in-spark-dataframe

Assuming n records of data frames is almost constant but m observations define the real computational complexity a regular repartition will only ensure roughly equal amounts of n records per partition not considering the size of the array.

Ideally, I would want to make sure that especially arrays with many elements do not end up in the same partition in order to prevent data skew.

Georg Heiler added a comment - 19/Sep/18 07:22 What about repartitioning by complex types, i.e. size of array? https://stackoverflow.com/questions/46240688/how-to-equally-partition-array-data-in-spark-dataframe Assuming n records of data frames is almost constant but m observations define the real computational complexity a regular repartition will only ensure roughly equal amounts of n records per partition not considering the size of the array. Ideally, I would want to make sure that especially arrays with many elements do not end up in the same partition in order to prevent data skew.

Arseniy Tashoyan added a comment - 02/Dec/18 09:44

What do you think about this one: ~~SPARK-23693~~?

Arseniy Tashoyan added a comment - 02/Dec/18 09:44 What do you think about this one: SPARK-23693 ?

People

Assignee:: Unassigned

Reporter:: Xiao Li

Votes:: 3 Vote for this issue

Watchers:: 26 Start watching this issue

Dates

Created:: 09/Apr/18 04:48

Updated:: 30/Jan/20 13:24

Resolved:: 10/Sep/18 14:04