Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Cannot Reproduce
-
Impala 2.6.0
-
None
Description
Running the stress test with low memory overcommit on the release build, I noticed that when the stress test does its memory leak check it sometimes gets stuck with a single query running for an hour. I was able to catch it live in one case. The query was in "FINISHED" state and I was able to cancel the query, but the stress test still thought it was running. I'm unsure if this is a bug in the stress test, or if there is a product bug.
00:38:30 2733 | 35 | 150 | 0 | 276 | 3 | 5279 | 112585 | 43860 | 40419 00:38:35 2737 | 44 | 150 | 0 | 277 | 3 | 4438 | 113716 | 40930 | 37443 00:38:40 2744 | 45 | 150 | 0 | 279 | 3 | 613 | 114426 | 36850 | 33761 00:38:45 2748 | 48 | 150 | 0 | 280 | 3 | 6794 | 111180 | 37060 | 33372 00:38:51 2750 | 55 | 150 | 0 | 280 | 3 | 5639 | 115421 | 34880 | 30424 00:38:56 2750 | 55 | 150 | 0 | 280 | 3 | 5639 | 115421 | 36560 | 32061 00:39:01 2757 | 50 | 150 | 0 | 281 | 3 | 5148 | 110906 | 36150 | 30977 00:39:06 2760 | 53 | 150 | 0 | 281 | 3 | 3542 | 114613 | 35900 | 30807 00:39:11 2761 | 52 | 150 | 0 | 281 | 3 | 3542 | 114475 | 36160 | 31848 00:39:16 2763 | 53 | 150 | 0 | 282 | 3 | 5821 | 113963 | 38700 | 34009 00:39:21 2763 | 53 | 150 | 0 | 282 | 3 | 5821 | 113963 | 40810 | 36439 00:39:26 2773 | 49 | 150 | 0 | 282 | 3 | 613 | 115722 | 40580 | 36047 00:39:28 00:39:28 15708 140570227885824 ERROR:hiveserver2[560]:Failed to open transport (tries_left=3) 00:39:28 Traceback (most recent call last): 00:39:28 File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/hiveserver2.py", line 557, in wrapper 00:39:28 return func(*args, **kwargs) 00:39:28 File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/hiveserver2.py", line 695, in fetch_results 00:39:28 resp = service.FetchResults(req) 00:39:28 File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/_thrift_gen/TCLIService/TCLIService.py", line 625, in FetchResults 00:39:28 return self.recv_FetchResults() 00:39:28 File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/_thrift_gen/TCLIService/TCLIService.py", line 643, in recv_FetchResults 00:39:28 result.read(self._iprot) 00:39:28 File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/_thrift_gen/TCLIService/TCLIService.py", line 2934, in read 00:39:28 self.success.read(iprot) 00:39:28 File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/_thrift_gen/TCLIService/ttypes.py", line 5883, in read 00:39:28 self.results.read(iprot) 00:39:28 File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/_thrift_gen/TCLIService/ttypes.py", line 2822, in read 00:39:28 _elem115.read(iprot) 00:39:28 File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/_thrift_gen/TCLIService/ttypes.py", line 2702, in read 00:39:28 self.stringVal.read(iprot) 00:39:28 File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/_thrift_gen/TCLIService/ttypes.py", line 2482, in read 00:39:28 _elem95 = iprot.readString(); 00:39:28 File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/thirdparty/hive-1.1.0-cdh5.8.0/lib/py/thrift/protocol/TBinaryProtocol.py", line 218, in readString 00:39:28 len = self.readI32() 00:39:28 File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/thirdparty/hive-1.1.0-cdh5.8.0/lib/py/thrift/protocol/TBinaryProtocol.py", line 203, in readI32 00:39:28 buff = self.trans.readAll(4) 00:39:28 File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/thirdparty/hive-1.1.0-cdh5.8.0/lib/py/thrift/transport/TTransport.py", line 58, in readAll 00:39:28 chunk = self.read(sz-have) 00:39:28 File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/thirdparty/hive-1.1.0-cdh5.8.0/lib/py/thrift/transport/TTransport.py", line 155, in read 00:39:28 self.__rbuf = StringIO(self.__trans.read(max(sz, self.DEFAULT_BUFFER))) 00:39:28 File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/thirdparty/hive-1.1.0-cdh5.8.0/lib/py/thrift/transport/TSocket.py", line 92, in read 00:39:28 buff = self.handle.recv(sz) 00:39:28 error: [Errno 11] Resource temporarily unavailable 00:39:32 2775 | 48 | 151 | 0 | 282 | 3 | 613 | 115948 | 39940 | 35808 00:39:37 2779 | 64 | 151 | 0 | 283 | 3 | 280 | 115793 | 39930 | 36154 00:39:42 2790 | 54 | 152 | 0 | 287 | 3 | 3162 | 113313 | 40270 | 36361 00:39:47 2795 | 49 | 152 | 0 | 287 | 3 | 3162 | 97069 | 39720 | 34053 00:39:52 2803 | 41 | 153 | 0 | 288 | 3 | 3162 | 93141 | 32810 | 28945 00:39:57 2805 | 39 | 153 | 0 | 288 | 3 | 3162 | 92870 | 33850 | 30323 00:40:02 2807 | 37 | 153 | 0 | 288 | 3 | 3162 | 92296 | 34080 | 30737 00:40:08 2808 | 36 | 153 | 0 | 288 | 3 | 3162 | 91389 | 33280 | 29883 00:40:13 2811 | 33 | 153 | 0 | 288 | 3 | 3162 | 90058 | 32790 | 29636 00:40:18 2815 | 29 | 153 | 0 | 288 | 3 | 3162 | 83001 | 31950 | 28959 00:40:23 2818 | 26 | 153 | 0 | 288 | 3 | 3162 | 80474 | 30060 | 27552 00:40:28 2821 | 23 | 153 | 0 | 288 | 3 | 3162 | 78239 | 29470 | 27381 00:40:33 2826 | 18 | 153 | 0 | 288 | 3 | 3162 | 69863 | 26430 | 24509 00:40:38 2831 | 13 | 153 | 0 | 288 | 3 | 3162 | 43640 | 22440 | 21086 00:40:43 2832 | 12 | 153 | 0 | 288 | 3 | 3162 | 39202 | 19090 | 17968 00:40:49 2833 | 11 | 153 | 0 | 288 | 3 | 3162 | 36751 | 16670 | 15759 00:40:54 2835 | 9 | 153 | 0 | 288 | 3 | 3162 | 35976 | 13560 | 12786 00:40:59 2836 | 8 | 153 | 0 | 288 | 3 | 3162 | 27070 | 12910 | 12377 00:41:04 2836 | 8 | 153 | 0 | 288 | 3 | 3162 | 27070 | 8350 | 8012 00:41:09 2837 | 7 | 153 | 0 | 288 | 3 | 3162 | 22238 | 8240 | 7319 00:41:14 2839 | 5 | 153 | 0 | 288 | 3 | 3162 | 16755 | 7280 | 7071 00:41:19 2839 | 5 | 153 | 0 | 288 | 3 | 3162 | 16755 | 6010 | 6088 00:41:24 2839 | 5 | 153 | 0 | 288 | 3 | 3162 | 16755 | 5830 | 6089 00:41:30 2841 | 3 | 153 | 0 | 288 | 3 | 3162 | 10885 | 5780 | 6073 00:41:35 2842 | 2 | 153 | 0 | 288 | 3 | 3162 | 5737 | 4510 | 5062 00:41:40 2842 | 2 | 153 | 0 | 288 | 3 | 3162 | 5737 | 3540 | 4285 00:41:45 2842 | 2 | 153 | 0 | 288 | 3 | 3162 | 5737 | 3540 | 4284 00:41:50 2842 | 2 | 153 | 0 | 288 | 3 | 3162 | 5737 | 3530 | 4255 00:41:55 2842 | 2 | 153 | 0 | 288 | 3 | 3162 | 5737 | 3430 | 4254 00:42:00 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3040 | 3940 00:42:06 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3040 | 3940 00:42:11 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3040 | 3940 00:42:16 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3040 | 3940 00:42:21 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3040 | 3940 00:42:26 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3040 | 3940 00:42:31 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3040 | 3940 00:42:36 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3040 | 3944 00:42:41 Done | Running | Mem Lmt Ex | Time Out | Cancel | Err | Next Qry Mem Lmt | Tot Qry Mem Lmt | Tracked Mem | RSS Mem 00:42:41 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3040 | 3944 00:42:47 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3040 | 3944 00:42:52 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3040 | 3944 00:42:57 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3040 | 3944 00:43:02 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3040 | 3944 00:43:07 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3040 | 3944 00:43:12 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3040 | 3943 00:43:17 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3040 | 3943 00:43:22 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3040 | 3943 00:43:28 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3030 | 3943 00:43:33 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3030 | 3943 00:43:38 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3030 | 3943 00:43:43 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3030 | 3943 00:43:48 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3030 | 3943 00:43:53 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3030 | 3943 00:43:58 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3030 | 3945 00:44:04 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3030 | 3945 00:44:09 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3030 | 3951 00:44:14 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3030 | 3951 ... 01:39:14 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3030 | 4889 01:39:19 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3030 | 4894 01:39:24 2843 | 1 | 153 | 0 | 288 | 3 | 3162 | 5015 | 3030 | 4894 01:39:28 01:39:28 15708 140570236278528 ERROR:hiveserver2[560]:Failed to open transport (tries_left=3) 01:39:28 Traceback (most recent call last): 01:39:28 File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/hiveserver2.py", line 557, in wrapper 01:39:28 return func(*args, **kwargs) 01:39:28 File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/hiveserver2.py", line 870, in cancel_operation 01:39:28 resp = service.CancelOperation(req) 01:39:28 File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/_thrift_gen/TCLIService/TCLIService.py", line 535, in CancelOperation 01:39:28 return self.recv_CancelOperation() 01:39:28 File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/infra/python/env/local/lib/python2.7/site-packages/impala/_thrift_gen/TCLIService/TCLIService.py", line 546, in recv_CancelOperation 01:39:28 (fname, mtype, rseqid) = self._iprot.readMessageBegin() 01:39:28 File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/thirdparty/hive-1.1.0-cdh5.8.0/lib/py/thrift/protocol/TBinaryProtocol.py", line 137, in readMessageBegin 01:39:28 name = self.trans.readAll(sz) 01:39:28 File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/thirdparty/hive-1.1.0-cdh5.8.0/lib/py/thrift/transport/TTransport.py", line 58, in readAll 01:39:28 chunk = self.read(sz-have) 01:39:28 File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/thirdparty/hive-1.1.0-cdh5.8.0/lib/py/thrift/transport/TTransport.py", line 155, in read 01:39:28 self.__rbuf = StringIO(self.__trans.read(max(sz, self.DEFAULT_BUFFER))) 01:39:28 File "/var/lib/jenkins/workspace/Impala-Stress-Test-Physical/Impala/thirdparty/hive-1.1.0-cdh5.8.0/lib/py/thrift/transport/TSocket.py", line 92, in read 01:39:28 buff = self.handle.recv(sz) 01:39:28 timeout: timed out