Description
HiveSplitGenerator calculates the number of available slots from available memory like this:
if (getContext() != null) { totalResource = getContext().getTotalAvailableResource().getMemory(); taskResource = getContext().getVertexTaskResource().getMemory(); availableSlots = totalResource / taskResource; }
I had a scenario where the total memory was calculated correctly, but the task memory returned -1. This led to error like these:
tez.HiveSplitGenerator: Number of input splits: 1. -3641 available slots, 1.7 waves. Input format is: org.apache.hadoop.hive.ql.io.HiveInputFormat Estimated number of tasks: -6189 for bucket 1 java.lang.IllegalArgumentException: Illegal Capacity: -6189
Admittedly, this happened during development, and hopefully will not occur on a properly configured cluster. (Although I'm not sure what the issue was on my setup, possibly XMX set higher than physical memory.)
In any case, it feels like setting availableSlots < 1 will never lead to desired behavior, so in such cases we could emit a warning and correct the value to 1.