03-14-2019 02:04 PM
Hello, We are in the process of deploying a Cloudera cluster. Our cluster was up and running fine untill Hbase failed with "Java heap dump directory free space error". All the other services also showing concering health with the same error. Now the services HDFS,Spark2,ZKFC,Journalnodes has concerning health issue and HBASE has critical health issue.
this error came up couple of days back but even before we looked into it. the error went away in couple of days. Now again this error popped up.
Info about cluster:
5 worker nodes, 2 masters(HA), Cloudera Management service.
No data in the cluster.
Now we are about to enable kerberos but this error showed up.
below is the error:
03-14-2019 02:28 PM
The error message you're seeing is a monitoring alert from Cloudera Manager. CM is checking the available space of your configured "heap dump directory" for each of these services, and alerting if it's running out of free space based on your configurations. You can read more on this health test here.
I'd recommend reviewing your configurations, both for the monitoring of this health test as well as the directory (or directories*) you have configured for each of your instances' heap dump directory.
If this is a shared drive that's frequently filling up, you may want to consider changing the configured heap dump directory to another endpoint with enough free space to accommodate your instances. Otherwise, you can simply modify the configuration for the monitoring alert to alleviate the messages (though I recommend dealing with the issue directly as opposed to suppressing it).