I installed Cloudera using PATH B installation in 4 machines (VMs, Centos 7) 1 master and 3 slaves, after installation i get an error in clock synchronization in every slave, I resolve it when I do :
systemctl start ntpd
After a few minutes I get an error in master node and i can't display cloudera page (master:7180) although cloudera-scm-server status is running.
I noticed afterwards that the hard drive of Master node is full: when I do : df -h
I get :
[root@master ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 34G 34G 20K 100% / devtmpfs 4.1G 0 4.1G 0% /dev tmpfs 4.1G 0 4.1G 0% /dev/shm tmpfs 4.1G 8.7M 4.1G 1% /run tmpfs 4.1G 0 4.1G 0% /sys/fs/cgroup /dev/sda1 497M 212M 286M 43% /boot /dev/mapper/centos-home 17G 36M 17G 1% /home tmpfs 833M 0 833M 0% /run/user/0
I thought that maybe the ntpd log is behind all that.
if / dir is full (use% = 100%) so the master can't desplay any think.
Any help please to resolve this, and avoid hard disk bombardment of Master node.
This is the third I'm trying to install cloudera and every time I have the same problem.
Can you run the following commands as root and identify which particular folder is consuming more space. Also once it returns a result, use the below command again with that folder name and dig further until you reach the correct sub folder
$ du -sh /
$ du -sh /*
Note: This is a disk space issue, I don't find anything related to memory in your description. So the topic and description are confusing
when i do : du -sh /
i get :
du: cannot access ‘/proc/4982/task/4982/fd/4’: No such file or directory du: cannot access ‘/proc/4982/task/4982/fdinfo/4’: No such file or directory du: cannot access ‘/proc/4982/fd/4’: No such file or directory du: cannot access ‘/proc/4982/fdinfo/4’: No such file or directory 34G /
I found the files using that space :
-rw-------. 1 cloudera-scm cloudera-scm 359M Mar 27 14:40 mgmt_mgmt-NAVIGATOR-9a89af62abe8393b48c78926720ffe2c_pid28766.hprof
It is repeated 40 times.
-rw-------. 1 cloudera-scm cloudera-scm 761M Mar 27 15:10 mgmt_mgmt-NAVIGATORMETASERVER-9a89af62abe8393b48c78926720ffe2c_pid11739.hprof
It is repeated 12 times.
How to resolve this ?
This is a bit off topic, but you can configure Navigator Metadata Server Heap in Cloudera Manager via "Java Heap Size of Navigator Metadata Server in Bytes"
Navigator Audit Server:
Java Heap Size of Auditing Server in Bytes
after configure Navigator Metadata Server Heap, i m trying to restart Cloudera Management Service but i can't .
I get :
Cannot restart service when Host Monitor (master) is in STOPPING state
In the Host Monitor log file :
mars 29, 14:05:55.133 ERROR com.cloudera.cmon.firehose.Main Could not fetch descriptor after 5 tries, exiting.
and the number of files mgmt_mgmt-NAVIGATORMETASERVER* increased
Please start a new thread for the issue trying to restart Host Monitor as this is not related to the existing one.
Restarting the Host Monitor is not required for restarting Navigator. In Cloudera Manager, click on Clusters --> Cloudera Management Service
Then, click the "Instances" subtab
check the boxes beside the Navigator server you want to restart and choose "restart" from the Actions for Selected drop-down button.
This allows you to restart the roles you desire only.
I do not know why the number of files mgmt_mgmt-NAVIGATORMETASERVER* increases
Despite that I have increased java heap size
Now cloudera work Fine but in Host Monitor log file I get an error :
Could not fetch descriptor after 5 tries, exiting.
And i can't restart this service, and when i'm trying to restart the Cloudera Management Service i get :
Cannot restart service when Host Monitor (master) is in STOPPING state.
Thanks for sharing the knowledge ,i just missed this point.
However from my past experience, we used to have lots of logs and lack in space due to the same above issue and to resolve it we stop navigator.
For my own curiosity, I would like to know what will be the impact on cluster if I stop navigator ?
Cloudera Navigator provides auditing and data management. Removing it will not stop you from being able to run jobs on your cluster but you will not have fine grained auditing, metadata tagging etc.