Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
-
None
-
ghx-label-4
Description
While analyzing performance of partition exchange operator I noticed that there is dependency and a function call per row in the hot path.
// hash-partition batch's rows across channels // TODO: encapsulate this in an Expr as we've done for Kudu above and remove this case // once we have codegen here. int num_channels = channels_.size(); for (int i = 0; i < batch->num_rows(); ++i) { TupleRow* row = batch->GetRow(i); uint64_t hash_val = EXCHANGE_HASH_SEED; for (int j = 0; j < partition_exprs_.size(); ++j) { ScalarExprEvaluator* eval = partition_expr_evals_[j]; void* partition_val = eval->GetValue(row); // We can't use the crc hash function here because it does not result in // uncorrelated hashes with different seeds. Instead we use FastHash. // TODO: fix crc hash/GetHashValue() DCHECK(&(eval->root()) == partition_exprs_[j]); hash_val = RawValue::GetHashValueFastHash( partition_val, partition_exprs_[j]->type(), hash_val); } RETURN_IF_ERROR(channels_[hash_val % num_channels]->AddRow(row)); }
Force inlining DataStreamSender::Channel::AddRow and breaking up the loop improves partition exchange performance by 5%
Code-generation of the hash computation IMPALA-5168 should give another 10% speedup.