Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
ghx-label-9
Description
Another potential extension of IMPALA-5444 is that we can distribute the codegen work of different fragments across different backends. Today, each fragment will generate the same code on each backend server it's assigned to run on. This is mostly redundant work (except for scan nodes if different scan ranges correspond to different file formats). It would be great to consolidate the code generation work items among the backend servers and avoids redundant work. The codegen for a fragment (or an exec node if we allow ourselves to use multiple LLVM modules per fragment so as to allow parallel codegen for different exec nodes in a fragment) could be assigned to backend servers and the compiled code can be shipped to the backend Impalad servers when it's ready. Of course, this may involve some security issues as we have to trust the binary being shipped over. We may also need to take into account of the latency for shipping the code. However, this is potentially a huge saving in CPUs for queries with many fragments running on a huge cluster.
Attachments
Issue Links
- depends upon
-
IMPALA-7656 Remove all uses of GetCodegendComputeFnWrapper()
- Resolved
-
IMPALA-10196 Remove LlvmCodeGen::CastPtrToLlvmPtr
- Resolved
-
IMPALA-10332 Add file formats to HdfsScanNode's thrift representation and codegen for those
- Resolved
-
IMPALA-5444 Asynchronous code generation
- Resolved