Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 2.8.0
Description
After IMPALA-4467 and IMPALA-4882, stress test "training" tries to find the minimum memory limit needed to perform COMPUTE STATS statements, each with a variety of MT_DOP settings.
The stress test "training" is failing during this phase, with the following:
Spilling has been disabled for plans that do not have stats and are not hinted to prevent potentially bad plans from using too many cluster resources. Please run COMPUTE STATS on these tables, hint the plan or disable this behavior via the DISABLE_UNSAFE_SPILLS query option.
In this case, the failure was from this sequence:
USE tpcds_300_decimal_parquet; SET ABORT_ON_ERROR=1; SET MT_DOP=1; SET MEM_LIMIT=93M; COMPUTE STATS catalog_returns;
This was near the end of the MT_DOP=1 COMPUTE STATS catalog_returns training, in which concurrent_select.py performs a MEM_LIMIT-wise binary search to find the minimum memory limit needed to run COMPUTE STATS (for the given MT_DOP). Logs show the following memory limits applied first:
SET MEM_LIMIT=77308M SET MEM_LIMIT=38654M SET MEM_LIMIT=19327M SET MEM_LIMIT=9663M SET MEM_LIMIT=4831M SET MEM_LIMIT=2415M SET MEM_LIMIT=1207M SET MEM_LIMIT=603M SET MEM_LIMIT=301M SET MEM_LIMIT=150M <------ all successful completions through here SET MEM_LIMIT=75M <------ memory limit exceeded, which is fine SET MEM_LIMIT=112M <------ successful completion SET MEM_LIMIT=93M <------- error for this bug as described above
Without MT_DOP, but with the limit in place, I get the error I'd expect, but then I apply MT_DOP, and I hit the error in this bug.
USE tpcds_300_decimal_parquet; SET MEM_LIMIT=93M; COMPUTE STATS catalog_returns; WARNINGS: Memory limit exceeded Cannot perform aggregation at node with id 1. Failed to initialize hash table in preaggregation. The memory limit is too low to execute the query. SET MT_DOP=1; COMPUTE STATS catalog_returns; WARNINGS: Spilling has been disabled for plans that do not have stats and are not hinted to prevent potentially bad plans from using too many cluster resources. Please run COMPUTE STATS on these tables, hint the plan or disable this behavior via the DISABLE_UNSAFE_SPILLS query option.
This doesn't happen unconditionally with MT_DOP or even MT_DOP=1. This happened after all the training completed for:
tables: (call_center, catalog_page) X mt_dop: (1,2,4,8,16)
Unfortunately this seems somewhat non-deterministic as to which table this could happen on: An earlier training attempt for MT_DOP=1 COMPUTE STATS catalog_returns succeeded. I checked the logs, and the exact same memory limits were applied. In the 93M attempt, the error returns was the typical "memory limit exceeded".
However, a different COMPUTE STATS on a table failed, in that case, it was:
USE tpcds_300_decimal_parquet; SET MT_DOP=1; SET ABORT_ON_ERROR=1; SET MEM_LIMIT=75M; COMPUTE STATS store_returns;
Attachments
Issue Links
- relates to
-
IMPALA-3200 Replace BufferedBlockMgr with new buffer pool
- Resolved