Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
test_libhdfs_threaded_hdfspp_test_shim_static has started failing more often and not a whole lot has changed that should directly impact this.
It looks like the failure mode is error injection in the read pipeline. Until HDFS-10951 is completed reads don't have a retry mechanism so any error injection will cause that read to fail. However that doesn't explain why these failures have been happening more frequently lately. The failures seem to be hardware or timing specific; they almost never happen in some environments and happen fairly frequently in others. It might help to set up a test harness to step through revisions and see if a specific commit causes the probability of a failure to go up.