Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 2.11.0
-
ghx-label-3
Description
mmokhtar did perf profiling for COMPUTE STATS TABLESAMPLE and discovered that a lot of time is spent on finalizing HLL intermediates. Most time is spent in powf().
Relevant snippet from AggregateFunctions::HllFinalEstimate() in aggregate-functions-ir.cc:
for (int i = 0; i < num_buckets; ++i) { harmonic_mean += powf(2.0f, -buckets[i]); if (buckets[i] == 0) ++num_zero_registers; }
Since we're doing a power of 2 using ldexp() should be much more efficient.
I did a microbenchmark and found that ldexp() is >10x faster than powf() for this scenario.