Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
2.5.2
-
None
-
None
Description
Ambari rolling-restart of HBase RegionServers failed to detect that RegionServers were not coming back online, continued to take down the rest of the RegionServers in the cluster.
Adding the following JVM tuning setting to HBASE_OPTS in hbase-env.sh template near the start of the options:
-XX:G1NewSizePercent=3
before the following option (which was set a couple options further along, it needs to go after this option):
-XX:+UnlockExperimentalVMOptions
This resulted in both HMaster and RegionServer startup failures, but Ambari did not detect that the RegionServers were not coming back online, and proceeded to take down the rest of the RegionServers.
Ambari should have checked that the first RegionServer restarted successfully and stayed up for the default 120 second rolling window via API checks on the RegionServer and that it is properly re-registered with active HMaster before moving on to the second RegionServer.
Also, Ambari should refuse to continue with any rolling restart if no HMasters are online, see linked ticket AMBARI-24699.
Attachments
Issue Links
- is related to
-
AMBARI-24381 Ambari Extensible Monitoring - use Nagios Plugins format and make extensible for users to extend checking
- Open
- relates to
-
AMBARI-24699 Ambari HBase do not rolling restart RegionServers if no HMasters are online
- Open