[SPARK-20845] Support specification of column names in INSERT INTO - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Incomplete
Affects Version/s: 2.0.0
Fix Version/s: None
Component/s: SQL
Labels:
- bulk-closed

Target Version/s:

3.0.0

Description

Some databases allow you to specify column names when specifying the target of an INSERT INTO. For example, in SQLite:

sqlite> CREATE TABLE twocolumn (x INT, y INT); INSERT INTO twocolumn(x, y) VALUES (44,51), (NULL,52), (42,53), (45,45)
   ...> ;
sqlite> select * from twocolumn;
44|51
|52
42|53
45|45

I have a corpus of existing queries of this form which I would like to run on Spark SQL, so I think we should extend our dialect to support this syntax.

When implementing this, we should make sure to test the following behaviors and corner-cases:

Number of columns specified is greater than or less than the number of columns in the table.
Specification of repeated columns.
Specification of columns which do not exist in the target table.
Permute column order instead of using the default order in the table.

For each of these, we should check how SQLite behaves and should also compare against another database. It looks like T-SQL supports this; see https://technet.microsoft.com/en-us/library/dd776381(v=sql.105).aspx under the "Inserting data that is not in the same order as the table columns" header.

Attachments

Issue Links

duplicates

SPARK-21548 Support insert into serial columns of table

Resolved

is duplicated by

SPARK-26234 Column list specification in INSERT statement

Closed

SPARK-23193 Insert into Spark Table statement cannot specify column names

Closed

links to

[Github] Pull Request #22532 (misutoth)

GitHub Pull Request #22532

Activity

People

Assignee:: Unassigned

Reporter:: Josh Rosen

Votes:: 3 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 22/May/17 18:56

Updated:: 25/May/21 01:49

Resolved:: 25/May/21 01:38