Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6188

test_top_n_reclaim is flaky

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.11.0
    • Impala 2.11.0
    • Infrastructure
    • ghx-label-3

    Description

      jbapple reported a test failure here: https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/607/

      03:16:32  TestTopNReclaimQuery.test_top_n_reclaim[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: text/none] 
      03:16:32 [gw10] linux2 -- Python 2.7.12 /home/ubuntu/Impala/bin/../infra/python/env/bin/python
      03:16:32 query_test/test_queries.py:246: in test_top_n_reclaim
      03:16:32     result = self.execute_query(self.QUERY, exec_options)
      03:16:32 common/impala_test_suite.py:512: in wrapper
      03:16:32     return function(*args, **kwargs)
      03:16:32 common/impala_test_suite.py:537: in execute_query
      03:16:32     return self.__execute_query(self.client, query, query_options)
      03:16:32 common/impala_test_suite.py:604: in __execute_query
      03:16:32     return impalad_client.execute(query, user=user)
      03:16:32 common/impala_connection.py:160: in execute
      03:16:32     return self.__beeswax_client.execute(sql_stmt, user=user)
      03:16:32 beeswax/impala_beeswax.py:173: in execute
      03:16:32     handle = self.__execute_query(query_string.strip(), user=user)
      03:16:32 beeswax/impala_beeswax.py:341: in __execute_query
      03:16:32     self.wait_for_completion(handle)
      03:16:32 beeswax/impala_beeswax.py:361: in wait_for_completion
      03:16:32     raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
      03:16:32 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
      03:16:32 E    Query aborted:Memory limit exceeded
      03:16:32 ---------------------------- Captured stderr setup -----------------------------
      03:16:32 -- connecting to: localhost:21000
      03:16:32 ----------------------------- Captured stderr call -----------------------------
      03:16:32 SET batch_size=0;
      03:16:32 SET num_nodes=0;
      03:16:32 SET disable_codegen_rows_threshold=0;
      03:16:32 SET disable_codegen=False;
      03:16:32 SET abort_on_error=1;
      03:16:32 SET mem_limit=50m;
      03:16:32 SET exec_single_node_rows_threshold=0;
      03:16:32 -- executing against localhost:21000
      03:16:32 select * from tpch.lineitem order by l_orderkey desc limit 10;;
      

      I was able to reproduce something similar locally by running in a loop:

      E   ImpalaBeeswaxException: ImpalaBeeswaxException:
      E    Query aborted:Memory limit exceeded: Failed to allocate memory in TopNNode::ReclaimTuplePool.
      E   SORT_NODE (id=1) could not allocate 190.00 B without exceeding limit.
      E   Error occurred on backend tarmstrong-box:22000 by fragment c641289b5b0652a4:9ce6652000000001
      E   Memory left in process limit: 8.15 GB
      E   Memory left in query limit: -2.95 MB
      E   Query(c641289b5b0652a4:9ce6652000000000): memory limit exceeded. Limit=50.00 MB Reservation=0 ReservationLimit=0 OtherMemory=52.95 MB Total=52.95 MB Peak=52.95 MB
      E     Fragment c641289b5b0652a4:9ce6652000000000: Reservation=0 OtherMemory=8.30 KB Total=8.30 KB Peak=232.50 KB
      E       EXCHANGE_NODE (id=2): Total=0 Peak=0
      E       DataStreamRecvr: Total=0 Peak=0
      E       PLAN_ROOT_SINK: Total=0 Peak=0
      E       CodeGen: Total=305.00 B Peak=224.50 KB
      E     Fragment c641289b5b0652a4:9ce6652000000001: Reservation=0 OtherMemory=52.94 MB Total=52.94 MB Peak=52.94 MB
      E       SORT_NODE (id=1): Total=702.00 KB Peak=706.00 KB
      E       HDFS_SCAN_NODE (id=0): Total=52.23 MB Peak=52.23 MB
      E       DataStreamSender (dst_id=2): Total=688.00 B Peak=688.00 B
      E       CodeGen: Total=23.94 KB Peak=1.64 MB
      E   
      E   Memory limit exceeded: Failed to allocate memory in TopNNode::ReclaimTuplePool.
      E   SORT_NODE (id=1) could not allocate 190.00 B without exceeding limit.
      E   Error occurred on backend tarmstrong-box:22000 by fragment c641289b5b0652a4:9ce6652000000001
      E   Memory left in process limit: 8.15 GB
      E   Memory left in query limit: -2.95 MB
      E   Query(c641289b5b0652a4:9ce6652000000000): memory limit exceeded. Limit=50.00 MB Reservation=0 ReservationLimit=0 OtherMemory=52.95 MB Total=52.95 MB Peak=52.95 MB
      E     Fragment c641289b5b0652a4:9ce6652000000000: Reservation=0 OtherMemory=8.30 KB Total=8.30 KB Peak=232.50 KB
      E       EXCHANGE_NODE (id=2): Total=0 Peak=0
      E       DataStreamRecvr: Total=0 Peak=0
      E       PLAN_ROOT_SINK: Total=0 Peak=0
      E       CodeGen: Total=305.00 B Peak=224.50 KB
      E     Fragment c641289b5b0652a4:9ce6652000000001: Reservation=0 OtherMemory=52.94 MB Total=52.94 MB Peak=52.94 MB
      E       SORT_NODE (id=1): Total=702.00 KB Peak=706.00 KB
      E       HDFS_SCAN_NODE (id=0): Total=52.23 MB Peak=52.23 MB
      E       DataStreamSender (dst_id=2): Total=688.00 B Peak=688.00 B
      E       CodeGen: Total=23.94 KB Peak=1.64 MB
      

      My current hypothesis is that it's spinning up an extra scanner thread in the failure case.

      Attachments

        Activity

          People

            tarmstrong Tim Armstrong
            tarmstrong Tim Armstrong
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: