[IMPALA-9660] Distributed codegen - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Distributed Exec
Labels:
- codegen

Target Version:

Product Backlog
Epic Color:
ghx-label-9

Description

Another potential extension of ~~IMPALA-5444~~ is that we can distribute the codegen work of different fragments across different backends. Today, each fragment will generate the same code on each backend server it's assigned to run on. This is mostly redundant work (except for scan nodes if different scan ranges correspond to different file formats). It would be great to consolidate the code generation work items among the backend servers and avoids redundant work. The codegen for a fragment (or an exec node if we allow ourselves to use multiple LLVM modules per fragment so as to allow parallel codegen for different exec nodes in a fragment) could be assigned to backend servers and the compiled code can be shipped to the backend Impalad servers when it's ready. Of course, this may involve some security issues as we have to trust the binary being shipped over. We may also need to take into account of the latency for shipping the code. However, this is potentially a huge saving in CPUs for queries with many fragments running on a huge cluster.

Attachments

Issue Links

depends upon

IMPALA-7656 Remove all uses of GetCodegendComputeFnWrapper()

Resolved

IMPALA-10196 Remove LlvmCodeGen::CastPtrToLlvmPtr

Resolved

IMPALA-10332 Add file formats to HdfsScanNode's thrift representation and codegen for those

Resolved

IMPALA-5444 Asynchronous code generation

Resolved

Activity

People

Assignee:: Daniel Becker

Reporter:: Tim Armstrong

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 15/Apr/20 16:14

Updated:: 17/Nov/20 11:58