Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5249

JDBC for Impala cannot cancel queries successfully - shown as "session closed"

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Impala 2.2
    • None
    • Frontend
    • Cloudera JDBC 2.5.34, Impala 2.2
    • ghx-label-4

    Description

      We are using Cloudera JDBC 2.5.34 to connector to Impala 2.2.

      We have had reports that if a user cancels a long-running query, it continues to execute and does not cancel appropriately. This means resources are tied up for a query no longer required.

      Although queries seem to be cancelled, they often run for the same length of time as completed queries.

      Why might this be happening? How can we use the JDBC to immediately cancel the query and free resources?

      The query status for cancelled queries is "Session closed". I noticed that if I do this in Hue, the query status is "canceled".

      I have investigated and our code is calling Statement.cancel() and close() as expected.

      Is this related to row batches? Could it be related to IMPALA-1869 Or is this just typical of cancellation through JDBC (https://docs.oracle.com/cd/E11882_01/java.112/e16548/apxtblsh.htm#JJDBC28983 )? Is the reason for the Query Status just because the status is not specified via JDBC?

      How is Statement.cancel() implemented in the JDBC driver and how does Impala handle it? Is there a specification for expected behaviour.

      Below are some of details of a query that does not appear to have been cancelled properly

      Query (id=f9412bbfb5615592:91bde18d434babb5)
        Summary
          Session ID: f2444cec28c22d64:5a891aadd50a30aa
          Session Type: HIVESERVER2
          HiveServer2 Protocol Version: V6
          Start Time: 2017-04-20 19:33:38.586908000
          End Time: 2017-04-20 19:34:40.673970000
          Query Type: QUERY
          Query State: EXCEPTION
          Query Status: Session closed
      	
      	
         Impala Version: impalad version 2.2.0-cdh5.4.5 RELEASE (build 4a81c1d04c39961ef14ff6121d543dd96ef60e6e)
          User: test-user
          Connected User: impala@CLOUDERA
          Delegated User: test-user
          Network Address: 10.25.21.26:47390
          Default Db: test_db
          Sql Statement: Select <LONG LIST OF FIELDS> from test_db.test_table ORDER BY ts  DESC  LIMIT 50	
      	
          Plan: 
      ----------------
      Estimated Per-Host Requirements: Memory=5.30GB VCores=1
      WARNING: The following tables are missing relevant table and/or column statistics.
      test_db.test_table
      
      F01:PLAN FRAGMENT [UNPARTITIONED]
        02:MERGING-EXCHANGE [UNPARTITIONED]
           order by: concat_ws('.', from_unixtime(unix_timestamp(ts), 'dd MMM yyyy HH:mm:ss'), CAST(extract(ts, 'millisecond') AS STRING)) DESC
           limit: 50
           hosts=4 per-host-mem=unavailable
           tuple-ids=1 row-size=1.46KB cardinality=0
      
      F00:PLAN FRAGMENT [RANDOM]
        DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=02, UNPARTITIONED]
        01:TOP-N [LIMIT=50]
        |  order by: concat_ws('.', from_unixtime(unix_timestamp(ts), 'dd MMM yyyy HH:mm:ss'), CAST(extract(ts, 'millisecond') AS STRING)) DESC
        |  hosts=4 per-host-mem=0B
        |  tuple-ids=1 row-size=1.46KB cardinality=0
        |
        00:SCAN HDFS [test_db.test_table, RANDOM]
           partitions=6/45 files=50 size=1.92GB
           table stats: 0 rows total (6 partition(s) missing stats)
           column stats: all
           hosts=4 per-host-mem=5.30GB
           tuple-ids=0 row-size=1.46KB cardinality=0
      ----------------
          Estimated Per-Host Mem: 5687476224
          Estimated Per-Host VCores: 1
          Tables Missing Stats: test_db.test_table
          Request Pool: default-pool
          ExecSummary: 
      Operator              #Hosts   Avg Time   Max Time   #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail                       
      -------------------------------------------------------------------------------------------------------------------------------
      02:MERGING-EXCHANGE        1  226.179us  226.179us       0           0    8.00 KB        -1.00 B  UNPARTITIONED                
      01:TOP-N                   4   58s272ms       1m1s     200           0    1.07 GB              0                               
      00:SCAN HDFS               4   95.567ms  107.109ms  24.38M           0  805.44 MB        5.30 GB  test_db.test_table 
          Planner Timeline
            Analysis finished: 9543737
            Equivalence classes computed: 15108936
            Single node plan created: 18757579
            Distributed plan created: 22242733
            Lineage info computed: 31840781
            Planning finished: 35370463
          Query Timeline
            Start execution: 60574
            Planning finished: 39487764
            Ready to start remote fragments: 43883334
            Remote fragments started: 83548681
            Rows available: 62070099362
            Cancelled: 62085704664
            Unregister query: 62087032514
      

      There are a few fields that look like this:

      from_unixtime(unix_timestamp(start_ts), 'dd MMM yyyy HH:mm:ss'), cast(extract(start_ts, 'millisecond') as string)) AS "start_ts"
      

      This is what we expect to see (a query from hue):

          Query Type: QUERY
          Query State: EXCEPTION
          Query Status: Cancelled
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            jdbcworries Jr
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: