Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.1.3
-
None
-
None
Description
When inserting data into Hive, the insert occasionally fails with messages like
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1605060173780_0039_2_00, diagnostics=[Task failed, taskId=task_1605060173780_0039_2_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Container container_1605060173780_0039_01_000002 finished with diagnostics set to [Container failed, exitCode=-104. [2020-11-11 02:35:11.768]Container [pid=16810,containerID=container_1605060173780_0039_01_000002] is running 7729152B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing container.
Specifically that the TezChild container is using some small amount of physical memory beyond its limit, so Tez kills the container.
Identifying how to resolve this is somewhat fraught:
- There's no clear troubleshooting advice around this error from our docs. Googling led to several forums that had some good and some awful advice. https://community.cloudera.com/t5/Community-Articles/Demystify-Apache-Tez-Memory-Tuning-Step-by-Step/ta-p/245279 is probably the best one.
- The issue itself comes down to Tez allocating 80% of the memory limit to Java heap (Xmx), which depending on other memory usage (stack memory, JIT, other JVM overhead) can be too little. By comparison: when running in a cgroup, Java defaults Xmx to 25% of the memory limit.
- Identifying the right parameters to tune, and verifying they've been set correctly, was a bit challenging. We ended up playing with tez.container.max.java.heap.fraction, hive.tez.container.size, and yarn.scheduler.minimum-allocation-mb. I would then verify those took effect by monitoring process arguments (with htop) for any changes in Xmx. Definitely had some missteps figuring out when it's hive.tez.container vs tez.container.
In the end, any of the following seems to have worked for us
- SET yarn.scheduler.minimum-allocation-mb=2048
- SET tez.container.max.java.heap.fraction=0.75
- SET hive.tez.container.size=2048
Attachments
Issue Links
- is related to
-
HIVE-18308 Error inserting data into many partitions
- Open
-
IMPALA-10316 load_nested.py failed due to out of memory during Jenkins GVO
- Resolved
-
HIVE-22172 I have an external table and I am trying to insert data into to it, i have checked the mappings and even compared the script to a similar one and everything looks ok, but I keep having the error message below
- Resolved
-
HIVE-22171 Issues while trying to insert into a table
- Open