Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
Impala 3.4.0
-
None
-
None
-
ghx-label-3
Description
Hi!
In our cluster we face the next problem periodically:
1. The query fails with the error like this "Exec() rpc failed: Timed out: ExecQueryFInstances RPC to <node_ip>:27000 timed out after 300.000s". Every time when the problem appears the problem node may be different.
2. We have analyzed minidumps of the impala daemon from two different cases (there are resolving minidumps in attachment). It seems that impala daemon stuck on cancelation query fragment:
Thread 244
0 libpthread-2.17.so + 0xba35
rax = 0xfffffffffffffe00 rdx = 0x0000000000000002
rcx = 0xffffffffffffffff rbx = 0x000000007cd81b10
rsi = 0x0000000000000080 rdi = 0x000000007cd81b14
rbp = 0x00007f7ba5ae8580 rsp = 0x00007f7ba5ae8520
r8 = 0x000000007cd81b00 r9 = 0x0000000000000000
r10 = 0x0000000000000000 r11 = 0x0000000000000246
r12 = 0x00000000eafe6400 r13 = 0x00007f7ba5ae85c0
r14 = 0x00007f845b7287d0 r15 = 0x00007f7ba5ae8660
rip = 0x00007f845b727a35
Found by: given as instruction pointer in context
1 impalad!impala::QueryState::Cancel() + 0xdb
rbp = 0x00007f7ba5ae8600 rsp = 0x00007f7ba5ae8590
rip = 0x00000000011791bb
Found by: previous frame's frame pointer
2 impalad!impala::ControlService::CancelQueryFInstances(impala::CancelQueryFInstancesRequestPB const*, impala::CancelQueryFInstancesResponsePB*, kudu::rpc::RpcContext*) + 0x177
rbx = 0x00007f8458e136a0 rbp = 0x00007f7ba5ae8780
rsp = 0x00007f7ba5ae8610 r12 = 0x00007f7ba5ae8720
r13 = 0x00007f7ba5ae86a0 rip = 0x0000000001218f77
Found by: call frame info
3 impalad!kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) + 0x17c
rbx = 0x0000000015e4e460 rbp = 0x00007f7ba5ae87e0
rsp = 0x00007f7ba5ae8790 r12 = 0x00000007a6bf8ee0
r13 = 0x0000000014f86740 r14 = 0x0000000014f86f00
r15 = 0x0000000014f87480 rip = 0x0000000001788ffc
Found by: call frame info
4 impalad!impala::ImpalaServicePool::RunThread() + 0x1be
rbx = 0x00007f840000000d rbp = 0x00007f7ba5ae88a0
rsp = 0x00007f7ba5ae87f0 r12 = 0x0000000018b30f80
r13 = 0x0000000000000000 r14 = 0x0000000000000051
r15 = 0x00007f840000000d rip = 0x00000000010dbdee
Found by: call frame info
5 impalad!impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*) + 0x30b
rbx = 0x00007f7ba5ae8970 rbp = 0x00007f7ba5ae8be0
rsp = 0x00007f7ba5ae88b0 r12 = 0x00007ffed2cdb298
r13 = 0x000000000592ee20 r14 = 0x00007f7ba5ae8910
r15 = 0x00007f8458e136a0 rip = 0x0000000001435f8b
Found by: call frame info
6 impalad!boost::detail::thread_data<boost::_bi::bind_t<void, void (std::string const&, std::string const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>), boost::_bi::list5<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::ThreadDebugInfo>, boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > >::run() + 0x7a
rbx = 0x0000000015e34e00 rbp = 0x00007f7ba5ae8c40
rsp = 0x00007f7ba5ae8bf0 r12 = 0x00007f7ba5ae8c00
r13 = 0x0000000001435c80 r14 = 0x0000000000000000
r15 = 0x00007f7ba5ae9700 rip = 0x0000000001436e5a
Found by: call frame info
7 impalad!thread_proxy + 0xea
rbx = 0x0000000015e34e00 rbp = 0x0000000000000000
rsp = 0x00007f7ba5ae8c50 r12 = 0x00007f7ba5ae8c50
r13 = 0x0000000000801000 r14 = 0x0000000000000000
r15 = 0x00007f7ba5ae9700 rip = 0x0000000001c18e1a
Found by: call frame info
8 libpthread-2.17.so + 0x7ea5
rbx = 0x0000000000000000 rbp = 0x0000000000000000
rsp = 0x00007f7ba5ae8ca0 r12 = 0x0000000000000000
r13 = 0x0000000000801000 r14 = 0x0000000000000000
r15 = 0x00007f7ba5ae9700 rip = 0x00007f845b723ea5
Found by: call frame info
9 libc-2.17.so + 0xfeb0d
rsp = 0x00007f7ba5ae8d40 rip = 0x00007f8458321b0d
Found by: stack scanning