Support Questions

Find answers, ask questions, and share your expertise

Master node Full disk

Contributor

I installed Cloudera using PATH B installation in 4 machines (VMs, Centos 7) 1 master and 3 slaves, after installation i get an error in clock synchronization in every slave, I resolve it when I do :

systemctl start ntpd 

After a few minutes I get an error in master node and i can't display cloudera page (master:7180) although cloudera-scm-server status is running.

I noticed afterwards that the hard drive of Master node is full: when I do : df -h

I get :

[root@master ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root 34G 34G 20K 100% /
devtmpfs 4.1G 0 4.1G 0% /dev
tmpfs 4.1G 0 4.1G 0% /dev/shm
tmpfs 4.1G 8.7M 4.1G 1% /run
tmpfs 4.1G 0 4.1G 0% /sys/fs/cgroup
/dev/sda1 497M 212M 286M 43% /boot
/dev/mapper/centos-home 17G 36M 17G 1% /home
tmpfs 833M 0 833M 0% /run/user/0

I thought that maybe the ntpd log is behind all that.

if / dir is full (use% = 100%) so the master can't desplay any think.

Any help please to resolve this, and avoid hard disk bombardment of Master node.

This is the third I'm trying to install cloudera and every time I have the same problem.

17 REPLIES 17

Champion

@ghandrisaleh

 

Can you run the following commands as root and identify which particular folder is consuming more space. Also once it returns a result, use the below command again with that folder name and dig further until you reach the correct sub folder

 

$ du -sh /

$ du -sh /*

 

Note: This is a disk space issue, I don't find anything related to memory in your description. So the topic and description are confusing 

Contributor

when i do : du -sh / 

i get : 

 

du: cannot access ‘/proc/4982/task/4982/fd/4’: No such file or directory
du: cannot access ‘/proc/4982/task/4982/fdinfo/4’: No such file or directory
du: cannot access ‘/proc/4982/fd/4’: No such file or directory
du: cannot access ‘/proc/4982/fdinfo/4’: No such file or directory
34G     /

 

Cloudera Employee

Try running 'du -h / --max-depth=3|grep G' to figure out which path is using that space. Then drill down from there.

 

 

 

 

 

Contributor

I found the files using that space :

 

-rw-------. 1 cloudera-scm cloudera-scm 359M Mar 27 14:40 mgmt_mgmt-NAVIGATOR-9a89af62abe8393b48c78926720ffe2c_pid28766.hprof

It is repeated 40 times.

 

And : 

-rw-------. 1 cloudera-scm cloudera-scm 761M Mar 27 15:10 mgmt_mgmt-NAVIGATORMETASERVER-9a89af62abe8393b48c78926720ffe2c_pid11739.hprof

It is repeated 12 times.

How to resolve this ?

Contributor

You can opt for stoping navigator, as navigator write huge amount of logs and your cluster can run without this service as well.

Rising Star
Hi,
The .hprof files are memory dumps created when a Java process fails due to lack of memory. It could be that either the server itself has insufficient memory or that the Navigator configuration does not allocate enough memory to the JVM. How much RAM does the VM running the master have?

Regards,
Jim

Contributor
The master is running over a VM of 9GB RAM.

Contributor

Hi Jim,

How to change the Navigator configuration to allocate enough memory to the JVM

Super Guru

Hello @ghandrisaleh,

 

This is a bit off topic, but you can configure Navigator Metadata Server Heap in Cloudera Manager via "Java Heap Size of Navigator Metadata Server in Bytes"

 

Navigator Audit Server: 
Java Heap Size of Auditing Server in Bytes

 

-Ben

Contributor

after configure Navigator Metadata Server Heap, i m trying to restart Cloudera Management Service but i can't .
I get :

 Cannot restart service when Host Monitor (master) is in STOPPING state

In the Host Monitor log file :

mars 29, 14:05:55.133	ERROR	com.cloudera.cmon.firehose.Main	
Could not fetch descriptor after 5 tries, exiting.

and the number of files mgmt_mgmt-NAVIGATORMETASERVER*  increased

Super Guru

@ghandrisaleh,

 

Please start a new thread for the issue trying to restart Host Monitor as this is not related to the existing one.

 

Restarting the Host Monitor is not required for restarting Navigator.  In Cloudera Manager, click on Clusters --> Cloudera Management Service

Then, click the "Instances" subtab

 

check the boxes beside the Navigator server you want to restart and choose "restart" from the Actions for Selected drop-down button.

 

This allows you to restart the roles you desire only.

 

 

Contributor

I do not know why the number of files mgmt_mgmt-NAVIGATORMETASERVER* increases

mgmt_mgmt-NAVIGATORMETASERVER-9a89af62abe8393b48c78926720ffe2c_pid19656.hprof

Despite that I have increased java heap size

Rising Star
Hi,
What do your Navigator logs say? The log files will tell you why the server
crashed. What did you set the heap size to?

Regards,
Jim

Contributor
I increase the heap size by 1GB.
and cloudera work fine

Contributor

Now cloudera work Fine but in Host Monitor log file I get an error :

Could not fetch descriptor after 5 tries, exiting.

And i can't restart this service, and when i'm trying to restart the Cloudera Management Service i get :

Cannot restart service when Host Monitor (master) is in STOPPING state.

Contributor

Hi Jim,

Thanks for sharing the knowledge ,i just missed this point.
However from my past experience, we used to have lots of logs and lack in space due to the same above issue and to resolve it we stop navigator.
For my own curiosity, I would like to know what will be the impact on cluster if I stop navigator ?

Rising Star

Hi,

Cloudera Navigator provides auditing and data management. Removing it will not stop you from being able to run jobs on your cluster but you will not have fine grained auditing, metadata tagging etc.

 

Regards,

Jim