Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 2.11.0
-
ghx-label-3
Description
jbapple reported a test failure here: https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/607/
03:16:32 TestTopNReclaimQuery.test_top_n_reclaim[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none] 03:16:32 [gw10] linux2 -- Python 2.7.12 /home/ubuntu/Impala/bin/../infra/python/env/bin/python 03:16:32 query_test/test_queries.py:246: in test_top_n_reclaim 03:16:32 result = self.execute_query(self.QUERY, exec_options) 03:16:32 common/impala_test_suite.py:512: in wrapper 03:16:32 return function(*args, **kwargs) 03:16:32 common/impala_test_suite.py:537: in execute_query 03:16:32 return self.__execute_query(self.client, query, query_options) 03:16:32 common/impala_test_suite.py:604: in __execute_query 03:16:32 return impalad_client.execute(query, user=user) 03:16:32 common/impala_connection.py:160: in execute 03:16:32 return self.__beeswax_client.execute(sql_stmt, user=user) 03:16:32 beeswax/impala_beeswax.py:173: in execute 03:16:32 handle = self.__execute_query(query_string.strip(), user=user) 03:16:32 beeswax/impala_beeswax.py:341: in __execute_query 03:16:32 self.wait_for_completion(handle) 03:16:32 beeswax/impala_beeswax.py:361: in wait_for_completion 03:16:32 raise ImpalaBeeswaxException("Query aborted:" + error_log, None) 03:16:32 E ImpalaBeeswaxException: ImpalaBeeswaxException: 03:16:32 E Query aborted:Memory limit exceeded 03:16:32 ---------------------------- Captured stderr setup ----------------------------- 03:16:32 -- connecting to: localhost:21000 03:16:32 ----------------------------- Captured stderr call ----------------------------- 03:16:32 SET batch_size=0; 03:16:32 SET num_nodes=0; 03:16:32 SET disable_codegen_rows_threshold=0; 03:16:32 SET disable_codegen=False; 03:16:32 SET abort_on_error=1; 03:16:32 SET mem_limit=50m; 03:16:32 SET exec_single_node_rows_threshold=0; 03:16:32 -- executing against localhost:21000 03:16:32 select * from tpch.lineitem order by l_orderkey desc limit 10;;
I was able to reproduce something similar locally by running in a loop:
E ImpalaBeeswaxException: ImpalaBeeswaxException: E Query aborted:Memory limit exceeded: Failed to allocate memory in TopNNode::ReclaimTuplePool. E SORT_NODE (id=1) could not allocate 190.00 B without exceeding limit. E Error occurred on backend tarmstrong-box:22000 by fragment c641289b5b0652a4:9ce6652000000001 E Memory left in process limit: 8.15 GB E Memory left in query limit: -2.95 MB E Query(c641289b5b0652a4:9ce6652000000000): memory limit exceeded. Limit=50.00 MB Reservation=0 ReservationLimit=0 OtherMemory=52.95 MB Total=52.95 MB Peak=52.95 MB E Fragment c641289b5b0652a4:9ce6652000000000: Reservation=0 OtherMemory=8.30 KB Total=8.30 KB Peak=232.50 KB E EXCHANGE_NODE (id=2): Total=0 Peak=0 E DataStreamRecvr: Total=0 Peak=0 E PLAN_ROOT_SINK: Total=0 Peak=0 E CodeGen: Total=305.00 B Peak=224.50 KB E Fragment c641289b5b0652a4:9ce6652000000001: Reservation=0 OtherMemory=52.94 MB Total=52.94 MB Peak=52.94 MB E SORT_NODE (id=1): Total=702.00 KB Peak=706.00 KB E HDFS_SCAN_NODE (id=0): Total=52.23 MB Peak=52.23 MB E DataStreamSender (dst_id=2): Total=688.00 B Peak=688.00 B E CodeGen: Total=23.94 KB Peak=1.64 MB E E Memory limit exceeded: Failed to allocate memory in TopNNode::ReclaimTuplePool. E SORT_NODE (id=1) could not allocate 190.00 B without exceeding limit. E Error occurred on backend tarmstrong-box:22000 by fragment c641289b5b0652a4:9ce6652000000001 E Memory left in process limit: 8.15 GB E Memory left in query limit: -2.95 MB E Query(c641289b5b0652a4:9ce6652000000000): memory limit exceeded. Limit=50.00 MB Reservation=0 ReservationLimit=0 OtherMemory=52.95 MB Total=52.95 MB Peak=52.95 MB E Fragment c641289b5b0652a4:9ce6652000000000: Reservation=0 OtherMemory=8.30 KB Total=8.30 KB Peak=232.50 KB E EXCHANGE_NODE (id=2): Total=0 Peak=0 E DataStreamRecvr: Total=0 Peak=0 E PLAN_ROOT_SINK: Total=0 Peak=0 E CodeGen: Total=305.00 B Peak=224.50 KB E Fragment c641289b5b0652a4:9ce6652000000001: Reservation=0 OtherMemory=52.94 MB Total=52.94 MB Peak=52.94 MB E SORT_NODE (id=1): Total=702.00 KB Peak=706.00 KB E HDFS_SCAN_NODE (id=0): Total=52.23 MB Peak=52.23 MB E DataStreamSender (dst_id=2): Total=688.00 B Peak=688.00 B E CodeGen: Total=23.94 KB Peak=1.64 MB
My current hypothesis is that it's spinning up an extra scanner thread in the failure case.