Details
-
Sub-task
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 2.8.0
-
None
Description
Generate random INSERT queries for Impala/Kudu tables. The syntax is roughly:
[with_statement] INSERT IGNORE INTO <KUDU_TBL> SELECT <Statement> INSERT IGNORE INTO <KUDU_TBL> <column list> VALUES <values list>
- The WITH statement is optional
- IGNORE will be required. This means ignore primary key duplications.
- We can have IGNORE SELECT or IGNORE VALUES statements.
The IGNORE requires comparison with Postgres 9.5 or higher (see IMPALA-4340).
The scope of this Jira is to take advantage of dependent work (IMPALA-4340, IMPALA-4338, IMPALA-4343, IMPALA-4351, IMPALA-4352) and add methods to the QueryGenerator to generate Pythonic representations of queries.
The primary key considerations are important:
- Primary keys can't be NULL
- Primary keys must be unique
- The IGNORE keyword means that duplicate-PK rows inserted will race to win. The determinism will be difficult to manage.
- The IGNORE keyword means that if a row with that PK already exists, any new rows attempted to be inserted with the same PK will also be ignored.
This means the query generator needs to be smarter than before about the queries it generates. For example, it shouldn't generate a query in which the expression for the inserted rows' PK column evaluates to a constant: at most 1 of the rows would actually get inserted. One option (for example, in the case of a numerical PK) would be to employ a special expression that applies an offset from the MAX() value in the column.