Details
Description
We are using Cloudera JDBC 2.5.34 to connector to Impala 2.2.
We have had reports that if a user cancels a long-running query, it continues to execute and does not cancel appropriately. This means resources are tied up for a query no longer required.
Although queries seem to be cancelled, they often run for the same length of time as completed queries.
Why might this be happening? How can we use the JDBC to immediately cancel the query and free resources?
The query status for cancelled queries is "Session closed". I noticed that if I do this in Hue, the query status is "canceled".
I have investigated and our code is calling Statement.cancel() and close() as expected.
Is this related to row batches? Could it be related to IMPALA-1869 Or is this just typical of cancellation through JDBC (https://docs.oracle.com/cd/E11882_01/java.112/e16548/apxtblsh.htm#JJDBC28983 )? Is the reason for the Query Status just because the status is not specified via JDBC?
How is Statement.cancel() implemented in the JDBC driver and how does Impala handle it? Is there a specification for expected behaviour.
Below are some of details of a query that does not appear to have been cancelled properly
Query (id=f9412bbfb5615592:91bde18d434babb5) Summary Session ID: f2444cec28c22d64:5a891aadd50a30aa Session Type: HIVESERVER2 HiveServer2 Protocol Version: V6 Start Time: 2017-04-20 19:33:38.586908000 End Time: 2017-04-20 19:34:40.673970000 Query Type: QUERY Query State: EXCEPTION Query Status: Session closed Impala Version: impalad version 2.2.0-cdh5.4.5 RELEASE (build 4a81c1d04c39961ef14ff6121d543dd96ef60e6e) User: test-user Connected User: impala@CLOUDERA Delegated User: test-user Network Address: 10.25.21.26:47390 Default Db: test_db Sql Statement: Select <LONG LIST OF FIELDS> from test_db.test_table ORDER BY ts DESC LIMIT 50 Plan: ---------------- Estimated Per-Host Requirements: Memory=5.30GB VCores=1 WARNING: The following tables are missing relevant table and/or column statistics. test_db.test_table F01:PLAN FRAGMENT [UNPARTITIONED] 02:MERGING-EXCHANGE [UNPARTITIONED] order by: concat_ws('.', from_unixtime(unix_timestamp(ts), 'dd MMM yyyy HH:mm:ss'), CAST(extract(ts, 'millisecond') AS STRING)) DESC limit: 50 hosts=4 per-host-mem=unavailable tuple-ids=1 row-size=1.46KB cardinality=0 F00:PLAN FRAGMENT [RANDOM] DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=02, UNPARTITIONED] 01:TOP-N [LIMIT=50] | order by: concat_ws('.', from_unixtime(unix_timestamp(ts), 'dd MMM yyyy HH:mm:ss'), CAST(extract(ts, 'millisecond') AS STRING)) DESC | hosts=4 per-host-mem=0B | tuple-ids=1 row-size=1.46KB cardinality=0 | 00:SCAN HDFS [test_db.test_table, RANDOM] partitions=6/45 files=50 size=1.92GB table stats: 0 rows total (6 partition(s) missing stats) column stats: all hosts=4 per-host-mem=5.30GB tuple-ids=0 row-size=1.46KB cardinality=0 ---------------- Estimated Per-Host Mem: 5687476224 Estimated Per-Host VCores: 1 Tables Missing Stats: test_db.test_table Request Pool: default-pool ExecSummary: Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ------------------------------------------------------------------------------------------------------------------------------- 02:MERGING-EXCHANGE 1 226.179us 226.179us 0 0 8.00 KB -1.00 B UNPARTITIONED 01:TOP-N 4 58s272ms 1m1s 200 0 1.07 GB 0 00:SCAN HDFS 4 95.567ms 107.109ms 24.38M 0 805.44 MB 5.30 GB test_db.test_table Planner Timeline Analysis finished: 9543737 Equivalence classes computed: 15108936 Single node plan created: 18757579 Distributed plan created: 22242733 Lineage info computed: 31840781 Planning finished: 35370463 Query Timeline Start execution: 60574 Planning finished: 39487764 Ready to start remote fragments: 43883334 Remote fragments started: 83548681 Rows available: 62070099362 Cancelled: 62085704664 Unregister query: 62087032514
There are a few fields that look like this:
from_unixtime(unix_timestamp(start_ts), 'dd MMM yyyy HH:mm:ss'), cast(extract(start_ts, 'millisecond') as string)) AS "start_ts"
This is what we expect to see (a query from hue):
Query Type: QUERY Query State: EXCEPTION Query Status: Cancelled