Uploaded image for project: 'IMPALA'
  2. IMPALA-3624

Stress test - query gets stuck doing nothing for an hour



    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Cannot Reproduce
    • Impala 2.6.0
    • None
    • Infrastructure


      Running the stress test with low memory overcommit on the release build, I noticed that when the stress test does its memory leak check it sometimes gets stuck with a single query running for an hour. I was able to catch it live in one case. The query was in "FINISHED" state and I was able to cancel the query, but the stress test still thought it was running. I'm unsure if this is a bug in the stress test, or if there is a product bug.

      00:38:30  2733 |      35 |        150 |        0 |    276 |   3 |             5279 |          112585 |       43860 |   40419
      00:38:35  2737 |      44 |        150 |        0 |    277 |   3 |             4438 |          113716 |       40930 |   37443
      00:38:40  2744 |      45 |        150 |        0 |    279 |   3 |              613 |          114426 |       36850 |   33761
      00:38:45  2748 |      48 |        150 |        0 |    280 |   3 |             6794 |          111180 |       37060 |   33372
      00:38:51  2750 |      55 |        150 |        0 |    280 |   3 |             5639 |          115421 |       34880 |   30424
      00:38:56  2750 |      55 |        150 |        0 |    280 |   3 |             5639 |          115421 |       36560 |   32061
      00:39:01  2757 |      50 |        150 |        0 |    281 |   3 |             5148 |          110906 |       36150 |   30977
      00:39:06  2760 |      53 |        150 |        0 |    281 |   3 |             3542 |          114613 |       35900 |   30807
      00:39:11  2761 |      52 |        150 |        0 |    281 |   3 |             3542 |          114475 |       36160 |   31848
      00:39:16  2763 |      53 |        150 |        0 |    282 |   3 |             5821 |          113963 |       38700 |   34009
      00:39:21  2763 |      53 |        150 |        0 |    282 |   3 |             5821 |          113963 |       40810 |   36439
      00:39:26  2773 |      49 |        150 |        0 |    282 |   3 |              613 |          115722 |       40580 |   36047
      00:39:28 00:39:28 15708 140570227885824 ERROR:hiveserver2[560]:Failed to open transport (tries_left=3)
      00:39:28 Traceback (most recent call last):
      00:39:28   File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/hiveserver2.py", line 557, in wrapper
      00:39:28     return func(*args, **kwargs)
      00:39:28   File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/hiveserver2.py", line 695, in fetch_results
      00:39:28     resp = service.FetchResults(req)
      00:39:28   File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/_thrift_gen/TCLIService/TCLIService.py", line 625, in FetchResults
      00:39:28     return self.recv_FetchResults()
      00:39:28   File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/_thrift_gen/TCLIService/TCLIService.py", line 643, in recv_FetchResults
      00:39:28     result.read(self._iprot)
      00:39:28   File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/_thrift_gen/TCLIService/TCLIService.py", line 2934, in read
      00:39:28     self.success.read(iprot)
      00:39:28   File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/_thrift_gen/TCLIService/ttypes.py", line 5883, in read
      00:39:28     self.results.read(iprot)
      00:39:28   File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/_thrift_gen/TCLIService/ttypes.py", line 2822, in read
      00:39:28     _elem115.read(iprot)
      00:39:28   File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/_thrift_gen/TCLIService/ttypes.py", line 2702, in read
      00:39:28     self.stringVal.read(iprot)
      00:39:28   File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/_thrift_gen/TCLIService/ttypes.py", line 2482, in read
      00:39:28     _elem95 = iprot.readString();
      00:39:28   File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/thirdparty/hive-1.1.0-cdh5.8.0/lib/py/thrift/protocol/TBinaryProtocol.py", line 218, in readString
      00:39:28     len = self.readI32()
      00:39:28   File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/thirdparty/hive-1.1.0-cdh5.8.0/lib/py/thrift/protocol/TBinaryProtocol.py", line 203, in readI32
      00:39:28     buff = self.trans.readAll(4)
      00:39:28   File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/thirdparty/hive-1.1.0-cdh5.8.0/lib/py/thrift/transport/TTransport.py", line 58, in readAll
      00:39:28     chunk = self.read(sz-have)
      00:39:28   File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/thirdparty/hive-1.1.0-cdh5.8.0/lib/py/thrift/transport/TTransport.py", line 155, in read
      00:39:28     self.__rbuf = StringIO(self.__trans.read(max(sz, self.DEFAULT_BUFFER)))
      00:39:28   File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/thirdparty/hive-1.1.0-cdh5.8.0/lib/py/thrift/transport/TSocket.py", line 92, in read
      00:39:28     buff = self.handle.recv(sz)
      00:39:28 error: [Errno 11] Resource temporarily unavailable
      00:39:32  2775 |      48 |        151 |        0 |    282 |   3 |              613 |          115948 |       39940 |   35808
      00:39:37  2779 |      64 |        151 |        0 |    283 |   3 |              280 |          115793 |       39930 |   36154
      00:39:42  2790 |      54 |        152 |        0 |    287 |   3 |             3162 |          113313 |       40270 |   36361
      00:39:47  2795 |      49 |        152 |        0 |    287 |   3 |             3162 |           97069 |       39720 |   34053
      00:39:52  2803 |      41 |        153 |        0 |    288 |   3 |             3162 |           93141 |       32810 |   28945
      00:39:57  2805 |      39 |        153 |        0 |    288 |   3 |             3162 |           92870 |       33850 |   30323
      00:40:02  2807 |      37 |        153 |        0 |    288 |   3 |             3162 |           92296 |       34080 |   30737
      00:40:08  2808 |      36 |        153 |        0 |    288 |   3 |             3162 |           91389 |       33280 |   29883
      00:40:13  2811 |      33 |        153 |        0 |    288 |   3 |             3162 |           90058 |       32790 |   29636
      00:40:18  2815 |      29 |        153 |        0 |    288 |   3 |             3162 |           83001 |       31950 |   28959
      00:40:23  2818 |      26 |        153 |        0 |    288 |   3 |             3162 |           80474 |       30060 |   27552
      00:40:28  2821 |      23 |        153 |        0 |    288 |   3 |             3162 |           78239 |       29470 |   27381
      00:40:33  2826 |      18 |        153 |        0 |    288 |   3 |             3162 |           69863 |       26430 |   24509
      00:40:38  2831 |      13 |        153 |        0 |    288 |   3 |             3162 |           43640 |       22440 |   21086
      00:40:43  2832 |      12 |        153 |        0 |    288 |   3 |             3162 |           39202 |       19090 |   17968
      00:40:49  2833 |      11 |        153 |        0 |    288 |   3 |             3162 |           36751 |       16670 |   15759
      00:40:54  2835 |       9 |        153 |        0 |    288 |   3 |             3162 |           35976 |       13560 |   12786
      00:40:59  2836 |       8 |        153 |        0 |    288 |   3 |             3162 |           27070 |       12910 |   12377
      00:41:04  2836 |       8 |        153 |        0 |    288 |   3 |             3162 |           27070 |        8350 |    8012
      00:41:09  2837 |       7 |        153 |        0 |    288 |   3 |             3162 |           22238 |        8240 |    7319
      00:41:14  2839 |       5 |        153 |        0 |    288 |   3 |             3162 |           16755 |        7280 |    7071
      00:41:19  2839 |       5 |        153 |        0 |    288 |   3 |             3162 |           16755 |        6010 |    6088
      00:41:24  2839 |       5 |        153 |        0 |    288 |   3 |             3162 |           16755 |        5830 |    6089
      00:41:30  2841 |       3 |        153 |        0 |    288 |   3 |             3162 |           10885 |        5780 |    6073
      00:41:35  2842 |       2 |        153 |        0 |    288 |   3 |             3162 |            5737 |        4510 |    5062
      00:41:40  2842 |       2 |        153 |        0 |    288 |   3 |             3162 |            5737 |        3540 |    4285
      00:41:45  2842 |       2 |        153 |        0 |    288 |   3 |             3162 |            5737 |        3540 |    4284
      00:41:50  2842 |       2 |        153 |        0 |    288 |   3 |             3162 |            5737 |        3530 |    4255
      00:41:55  2842 |       2 |        153 |        0 |    288 |   3 |             3162 |            5737 |        3430 |    4254
      00:42:00  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3040 |    3940
      00:42:06  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3040 |    3940
      00:42:11  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3040 |    3940
      00:42:16  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3040 |    3940
      00:42:21  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3040 |    3940
      00:42:26  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3040 |    3940
      00:42:31  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3040 |    3940
      00:42:36  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3040 |    3944
      00:42:41  Done | Running | Mem Lmt Ex | Time Out | Cancel | Err | Next Qry Mem Lmt | Tot Qry Mem Lmt | Tracked Mem | RSS Mem
      00:42:41  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3040 |    3944
      00:42:47  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3040 |    3944
      00:42:52  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3040 |    3944
      00:42:57  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3040 |    3944
      00:43:02  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3040 |    3944
      00:43:07  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3040 |    3944
      00:43:12  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3040 |    3943
      00:43:17  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3040 |    3943
      00:43:22  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3040 |    3943
      00:43:28  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3030 |    3943
      00:43:33  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3030 |    3943
      00:43:38  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3030 |    3943
      00:43:43  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3030 |    3943
      00:43:48  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3030 |    3943
      00:43:53  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3030 |    3943
      00:43:58  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3030 |    3945
      00:44:04  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3030 |    3945
      00:44:09  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3030 |    3951
      00:44:14  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3030 |    3951
      01:39:14  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3030 |    4889
      01:39:19  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3030 |    4894
      01:39:24  2843 |       1 |        153 |        0 |    288 |   3 |             3162 |            5015 |        3030 |    4894
      01:39:28 01:39:28 15708 140570236278528 ERROR:hiveserver2[560]:Failed to open transport (tries_left=3)
      01:39:28 Traceback (most recent call last):
      01:39:28   File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/hiveserver2.py", line 557, in wrapper
      01:39:28     return func(*args, **kwargs)
      01:39:28   File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/hiveserver2.py", line 870, in cancel_operation
      01:39:28     resp = service.CancelOperation(req)
      01:39:28   File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/_thrift_gen/TCLIService/TCLIService.py", line 535, in CancelOperation
      01:39:28     return self.recv_CancelOperation()
      01:39:28   File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/_thrift_gen/TCLIService/TCLIService.py", line 546, in recv_CancelOperation
      01:39:28     (fname, mtype, rseqid) = self._iprot.readMessageBegin()
      01:39:28   File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/thirdparty/hive-1.1.0-cdh5.8.0/lib/py/thrift/protocol/TBinaryProtocol.py", line 137, in readMessageBegin
      01:39:28     name = self.trans.readAll(sz)
      01:39:28   File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/thirdparty/hive-1.1.0-cdh5.8.0/lib/py/thrift/transport/TTransport.py", line 58, in readAll
      01:39:28     chunk = self.read(sz-have)
      01:39:28   File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/thirdparty/hive-1.1.0-cdh5.8.0/lib/py/thrift/transport/TTransport.py", line 155, in read
      01:39:28     self.__rbuf = StringIO(self.__trans.read(max(sz, self.DEFAULT_BUFFER)))
      01:39:28   File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/thirdparty/hive-1.1.0-cdh5.8.0/lib/py/thrift/transport/TSocket.py", line 92, in read
      01:39:28     buff = self.handle.recv(sz)
      01:39:28 timeout: timed out




            tarmstrong Tim Armstrong
            tarmstrong Tim Armstrong
            0 Vote for this issue
            3 Start watching this issue

