Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
ghx-label-6
Description
On the page: http://impala.apache.org/docs/build3x/html/topics/impala_kudu.html, at the end of the section: "Impala DML Support for Kudu Tables (INSERT, UPDATE, DELETE, UPSERT)", we should add text like:
Starting from Impala 2.9, Impala will automatically add a partition and sort step to INSERTs before sending the rows to Kudu. Since Kudu partitions and sorts rows on write, pre-partitioning and sorting takes some of the load off of Kudu, and helps ensure that large INSERTs complete without timing out, but it may slow down the end-to-end performance of the INSERT. Starting from Impala 2.10, the hints "/* +noshuffle,noclustered */" may be used to turn this pre-partitioning and sorting off. Additionally, since sorting may consume a lot of memory, users should consider setting a "mem_limit" for these queries.