Health test shows the following errors:
The health test result for HDFS_FREE_SPACE_REMAINING has become bad: Space free in the cluster: 0 B. Capacity of the cluster: 0 B. Percentage of capacity free: 0.00%. Critical threshold: 10.00%.
The health test result for HDFS_CANARY_HEALTH has become bad: Canary test failed to create parent directory for /tmp/.cloudera_health_monitoring_canary_files.
We manually verified that space isn't an issue. Connectivity testing is success. No issues with kdc or principals.
Please help to explain root cause for this error message.
Thanks for responding.
This is a new cluster. DN is up & running. I verified space through CM as well as logging to the server themselves.
I have the same issue with a brand new Cloudera Manager install on an AWS EC2 4 instance m4.xlarge cluster with 100GiB magnetic disk each.
Cloudera Manager Hosts view shows all 4 instances with a Disk Usage at 10.3-12.1 GiB / 115.6 GiB and "green" status.
The cluster is unuseable with HDFS in the resulting RED status.
What was the final resoltion on this?
I verified the space by logging onto the the server and issuing the following command:
ubuntu@ip-172-31-29-49:~$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 99G 8.3G 86G 9% /
none 4.0K 0 4.0K 0% /sys/fs/cgroup
udev 7.9G 12K 7.9G 1% /dev
tmpfs 1.6G 496K 1.6G 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 7.9G 0 7.9G 0% /run/shm
none 100M 0 100M 0% /run/user
cm_processes 7.9G 14M 7.9G 1% /run/cloudera-scm-agent/process
As you can see there is plenty of space available.
What do you suggest as a next step?
Usually this indicates the datanodes are not in contact with the name node. O bytes means there is no data nodes available to write to. Check the data node logs under /var/log/hadoo-hdfs
There will be some clues there, paste anything that springs to mind in the response here.
This thread is super super old, so it would be best to confirm you are seeing the same issue. What message do you see regarding the canary test failure?
Basically, the Service Monitor will perform a health check of HDFS by writing out a file to make sure that completes. If it doesn't complete, then that could mean some problems with HDFS that requires review so this triggers a bad health state.
The canary test does the following:
By default, the file name is:
It is possible that the Service Monitor log (in /var/log/cloudera-scm-firehose) has some error or exception reflecting the failure.
Note that the operation of writing to a file in HDFS requires communication with the NameNode and then the DataNode that the NameNode tells the client to write the file to. Failures could occur in various places.