Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
0.23.3, 2.0.1-alpha
-
None
-
None
Description
We ran into an instance where many nodes on a cluster ran out of disk space because the nodemanager logs were huge. Examining the logs showed many, many shuffle errors due to either ClosedChannelException or IOException from "Connection reset by peer" or "Broken pipe".