Details
Description
We observed with some of our apps (non-mapreduce apps that use filesystems) that they end up accumulating a huge memory footprint coming from FileSystem$Statistics$StatisticsData (in the allData list of Statistics).
Although the thread reference from StatisticsData is a weak reference, and thus can get cleared once a thread goes away, the actual StatisticsData instances in the list won't get cleared until any of these following methods is called on Statistics:
- getBytesRead()
- getBytesWritten()
- getReadOps()
- getLargeReadOps()
- getWriteOps()
- toString()
It is quite possible to have an application that interacts with a filesystem but does not call any of these methods on the Statistics. If such an application runs for a long time and has a large amount of thread churn, the memory footprint will grow significantly.
The current workaround is either to limit the thread churn or to invoke these operations occasionally to pare down the memory. However, this is still a deficiency with FileSystem$Statistics itself in that the memory is controlled only as a side effect of those operations.
Attachments
Attachments
Issue Links
- breaks
-
HADOOP-12706 TestLocalFsFCStatistics#testStatisticsThreadLocalDataCleanUp times out occasionally
- Closed
-
HADOOP-12958 PhantomReference for filesystem statistics can trigger OOM
- Closed
- is related to
-
MAPREDUCE-6735 Performance degradation caused by MAPREDUCE-5465 and HADOOP-12107
- Open
- relates to
-
HADOOP-12829 StatisticsDataReferenceCleaner swallows interrupt exceptions
- Resolved