Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 2.10.0, Impala 2.11.0
-
None
-
ghx-label-8
Description
During a stress test on a secure cluster one of the clients crashed in HiveServer2Cursor.cancel_operation().
The stress test debug log shows
2017-12-13 16:50:52,624 21607 Query Consumer DEBUG:concurrent_select[579]:Requesting memory reservation 2017-12-13 16:50:52,624 21607 Query Consumer DEBUG:concurrent_select[245]:Reserved 102 MB; 1455 MB available; 95180 MB overcommitted 2017-12-13 16:50:52,625 21607 Query Consumer DEBUG:concurrent_select[581]:Received memory reservation 2017-12-13 16:50:52,658 21607 Query Consumer DEBUG:concurrent_select[865]:Using tpcds_300_decimal_parquet database 2017-12-13 16:50:52,658 21607 Query Consumer DEBUG:db_connection[203]:IMPALA: USE tpcds_300_decimal_parquet 2017-12-13 16:50:52,825 21607 Query Consumer DEBUG:db_connection[203]:IMPALA: SET ABORT_ON_ERROR=1 2017-12-13 16:50:53,060 21607 Query Consumer DEBUG:concurrent_select[877]:Setting mem limit to 102 MB 2017-12-13 16:50:53,060 21607 Query Consumer DEBUG:db_connection[203]:IMPALA: SET MEM_LIMIT=102M 2017-12-13 16:50:53,370 21607 Query Consumer DEBUG:concurrent_select[881]:Running query with 102 MB mem limit at vc0704.test with timeout secs 52: select dt.d_year, item.i_category_id, item.i_category, sum(ss_ext_sales_price) from date_dim dt, store_sales, item where dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = item.i_item_sk and item.i_manager_id = 1 and dt.d_moy = 11 and dt.d_year = 2000 group by dt.d_year, item.i_category_id, item.i_category order by sum(ss_ext_sales_price) desc, dt.d_year, item.i_category_id, item.i_category limit 100; 2017-12-13 16:51:08,491 21607 Query Consumer DEBUG:concurrent_select[889]:Query id is b6425b84aa45f633:9ce7cad900000000 2017-12-13 16:51:15,337 21607 Query Consumer DEBUG:concurrent_select[900]:Waiting for query to execute 2017-12-13 16:51:22,316 21607 Query Consumer DEBUG:concurrent_select[900]:Waiting for query to execute 2017-12-13 16:51:27,266 21607 Fetch Results b6425b84aa45f633:9ce7cad900000000 DEBUG:concurrent_select[1009]:Fetching result for query with id b6425b84aa45f633:9ce7cad900000000 2017-12-13 16:51:44,625 21607 Query Consumer DEBUG:concurrent_select[940]:Attempting cancellation of query with id b6425b84aa45f633:9ce7cad900000000 2017-12-13 16:51:44,627 21607 Query Consumer INFO:hiveserver2[259]:Canceling active operation
The impalad log shows
I1213 16:50:54.287511 136399 admission-controller.cc:510] Schedule for id=b6425b84aa45f633:9ce7cad900000000 in pool_name=root.systest cluster_mem_needed=816.00 MB PoolConfig: max_requests=-1 max_queued=200 max_mem=-1.00 B I1213 16:50:54.289767 136399 admission-controller.cc:515] Stats: agg_num_running=184, agg_num_queued=0, agg_mem_reserved=1529.63 GB, local_host(local_mem_admitted=132.02 GB, num_admitted_running=21, num_queued=0, backend_mem_reserved=194.58 GB) I1213 16:50:54.291550 136399 admission-controller.cc:531] Admitted query id=b6425b84aa45f633:9ce7cad900000000 I1213 16:50:54.296922 136399 coordinator.cc:99] Exec() query_id=b6425b84aa45f633:9ce7cad900000000 stmt=/* Mem: 102 MB. Coordinator: vc0704.test. */ select dt.d_year, item.i_category_id, item.i_category, sum(ss_ext_sales_price) from date_dim dt, store_sales, item where dt.d_date_sk = store_sales.ss_sold_date_sk and store_sales.ss_item_sk = item.i_item_sk and item.i_manager_id = 1 and dt.d_moy = 11 and dt.d_year = 2000 group by dt.d_year, item.i_category_id, item.i_category order by sum(ss_ext_sales_price) desc, dt.d_year, item.i_category_id, item.i_category limit 100; I1213 16:50:59.263310 136399 query-state.cc:151] Using query memory limit from query options: 102.00 MB I1213 16:50:59.267033 136399 mem-tracker.cc:189] Using query memory limit: 102.00 MB I1213 16:50:59.272271 136399 coordinator.cc:357] starting execution on 8 backends for query b6425b84aa45f633:9ce7cad900000000 I1213 16:51:07.525143 136399 coordinator.cc:370] started execution on 8 backends for query b6425b84aa45f633:9ce7cad900000000 I1213 16:51:08.358772 136399 impala-hs2-server.cc:490] ExecuteStatement(): return_val=TExecuteStatementResp { 01: status (struct) = TStatus { 01: statusCode (i32) = 0, }, 02: operationHandle (struct) = TOperationHandle { 01: operationId (struct) = THandleIdentifier { 01: guid (string) = "3\xf6E\xaa\x84[B\xb6\x00\x00\x00\x00\xd9\xca\xe7\x9c", 02: secret (string) = "3\xf6E\xaa\x84[B\xb6\x00\x00\x00\x00\xd9\xca\xe7\x9c", }, 02: operationType (i32) = 0, 03: hasResultSet (bool) = true, }, } I1213 16:51:27.310685 136399 impala-hs2-server.cc:725] GetResultSetMetadata(): query_id=b6425b84aa45f633:9ce7cad900000000 I1213 16:51:27.316963 136399 impala-hs2-server.cc:759] GetResultSetMetadata(): return_val=TGetResultSetMetadataResp { 01: status (struct) = TStatus { 01: statusCode (i32) = 0, }, 02: schema (struct) = TTableSchema { 01: columns (list) = list<struct>[4] { [0] = TColumnDesc { 01: columnName (string) = "d_year", 02: typeDesc (struct) = TTypeDesc { 01: types (list) = list<struct>[1] { [0] = TTypeEntry { 01: primitiveEntry (struct) = TPrimitiveTypeEntry { 01: type (i32) = 3, }, }, }, }, 03: position (i32) = 0, }, [1] = TColumnDesc { 01: columnName (string) = "i_category_id", 02: typeDesc (struct) = TTypeDesc { 01: types (list) = list<struct>[1] { [0] = TTypeEntry { 01: primitiveEntry (struct) = TPrimitiveTypeEntry { 01: type (i32) = 3, }, }, }, }, 03: position (i32) = 1, }, [2] = TColumnDesc { 01: columnName (string) = "i_category", 02: typeDesc (struct) = TTypeDesc { 01: types (list) = list<struct>[1] { [0] = TTypeEntry { 01: primitiveEntry (struct) = TPrimitiveTypeEntry { 01: type (i32) = 7, }, }, }, }, 03: position (i32) = 2, }, [3] = TColumnDesc { 01: columnName (string) = "sum(ss_ext_sales_price)", 02: typeDesc (struct) = TTypeDesc { 01: types (list) = list<struct>[1] { [0] = TTypeEntry { 01: primitiveEntry (struct) = TPrimitiveTypeEntry { 01: type (i32) = 15, 02: typeQualifiers (struct) = TTypeQualifiers { 01: qualifiers (map) = map<string,struct>[2] { "precision" -> TTypeQualifierValue { 01: i32Value (i32) = 38, }, "scale" -> TTypeQualifierValue { 01: i32Value (i32) = 2, }, }, }, }, }, }, }, 03: position (i32) = 3, }, }, }, } I1213 16:51:27.426277 136399 coordinator.cc:789] Coordinator waiting for backends to finish, 2 remaining I1213 16:52:42.215745 136399 coordinator.cc:794] All backends finished successfully. I1213 16:52:42.223140 136399 coordinator.cc:1090] Release admssion control resources for query b6425b84aa45f633:9ce7cad900000000 I1213 16:52:42.230653 136399 impala-hs2-server.cc:677] CancelOperation(): query_id=b6425b84aa45f633:9ce7cad900000000 I1213 16:52:42.230882 136399 impala-server.cc:1075] Cancel(): query_id=b6425b84aa45f633:9ce7cad900000000 I1213 16:52:42.233353 136399 coordinator.cc:895] Cancel() query_id=b6425b84aa45f633:9ce7cad900000000 I1213 16:52:42.234686 136399 coordinator.cc:905] CancelBackends() query_id=b6425b84aa45f633:9ce7cad900000000, tried to cancel 0 backends I1213 16:52:42.276818 136399 impala-server.cc:1796] Connection from client 172.28.194.153:60218 closed, closing 2 associated session(s) I1213 16:52:42.277026 136399 impala-server.cc:992] UnregisterQuery(): query_id=b6425b84aa45f633:9ce7cad900000000 I1213 16:52:42.279270 136399 impala-server.cc:1075] Cancel(): query_id=b6425b84aa45f633:9ce7cad900000000 I1213 16:52:42.924029 136399 query-exec-mgr.cc:149] ReleaseQueryState(): query_id=b6425b84aa45f633:9ce7cad900000000 refcnt=1 I1213 16:52:43.094966 136399 thrift-util.cc:123] SSL_shutdown: tlsv1 alert record overflow
The test host kernel log shows
Dec 14 00:52:42 ip-172-28-194-153 kernel: [ 6295.010585] show_signal_msg: 36 callbacks suppressed Dec 14 00:52:42 ip-172-28-194-153 kernel: [ 6295.010592] python[22107]: segfault at 7f35001d4000 ip 00007f3557110794 sp 00007f350dfe4308 error 4 in libc-2.19.so[7f3557076000+1be000]
Although the stress test debug log shows that it started fetching results, none were saved to the output file for that query.
Attachments
Attachments
Issue Links
- causes
-
IMPALA-6662 Make stress test resilient to hangs due to client crashes
- Resolved