Reply
Explorer
Posts: 11
Registered: ‎01-13-2015

/run/cloudera-scm-agent/process fills up with invalid data

Hi,

It seems there's some kind of a bug causing all kinds of files under /run/cloudera-scm-agent/process to fill up with their cwd string, for example under /run/cloudera-scm-agent/process/456-hdfs-NAMENODE the file topology.map takes up 2.4 GB and it's contents is a repetetition of this string:

run/cloudera-scm-agent/process/456-hdfs-NAMENODE//run/cloudera-scm-agent/process/456-hdfs-NAMENODEr/run/cloudera-scm-agent/process/456-hdfs-NAMENODEu/run/cloudera-scm-agent/process/456-hdfs-NAMENODEn/run/cloudera-scm-agent/process/456-h
dfs-NAMENODE//run/cloudera-scm-agent/process/456-hdfs-NAMENODEc/run/cloudera-scm-agent/process/456-hdfs-NAMENODEl/run/cloudera-scm-agent/process/456-hdfs-NAMENODEo/run/cloudera-scm-agent/process/456-hdfs-NAMENODEu/run/cloudera-scm-agent/
process/456-hdfs-NAMENODEd/run/cloudera-scm-agent/process/456-hdfs-NAMENODEe/run/cloudera-scm-agent/process/456-hdfs-NAMENODEr/run/cloudera-scm-agent/process/456-hdfs-NAMENODEa/run/cloudera-scm-agent/process/456-hdfs-NAMENODE-/run/cloude

 

Same goes for cloudera-monitor.properties or hadoop-policy.xml under 457-yarn-JOBHISTORY:

57-yarn-JOBHISTORYa/run/cloudera-scm-agent/process/457-yarn-JOBHISTORY-/run/cloudera-scm-agent/process/457-yarn-JOBHISTORYs/run/cloudera-scm-agent/process/457-yarn-JOBHISTORYc/run/cloudera-scm-agent/process/457-yarn-JOBHISTORYm/run/cloud
era-scm-agent/process/457-yarn-JOBHISTORY-/run/cloudera-scm-agent/process/457-yarn-JOBHISTORYa/run/cloudera-scm-agent/process/457-yarn-JOBHISTORYg/run/cloudera-scm-agent/process/457-yarn-JOBHISTORYe/run/cloudera-scm-agent/process/457-yar
n-JOBHISTORYn/run/cloudera-scm-agent/process/457-yarn-JOBHISTORYt/run/cloudera-scm-agent/process/457-yarn-JOBHISTORY//run/cloudera-scm-agent/process/457-yarn-JOBHISTORYp/run/cloudera-scm-agent/process/457-yarn-JOBHISTORYr/run/cloudera-sc
m-agent/process/457-yarn-JOBHISTORYo/run/cloudera-scm-agent/process/457-yarn-JOBHISTORYc/run/cloudera-scm-agent/process/457-yarn-JOBHISTORYe/run/cloudera-scm-agent/process/457-yarn-JOBHISTORYs/run/cloudera-scm-agent/process/457-yarn-JOBH
ISTORYs/run/cloudera-scm-agent/process/457-yarn-JOBHISTORY//run/cloudera-scm-agent/process/457-yarn-JOBHISTORY4/run/cloudera-scm-agent/process/457-yarn-JOBHISTORY5/run/cloudera-scm-agent/process/457-yarn-JOBHISTORY7/run/cloudera-scm-agen
t/process/457-yarn-JOBHISTORY-/run/cloudera-scm-agent/process/457-yarn-JOBHISTORYy/run/cloudera-scm-agent/process/457-yarn-JOBHISTORYa/run/cloudera-scm-agent/process/457-yarn-JOBHISTORYr/run/cloudera-scm-agent/process/457-yarn-JOBHISTORY
n/run/cloudera-scm-agent/process/457-yarn-JOBHISTORY-/run/cloudera-scm-agent/process/457-yarn-JOBHISTORYJ/run/cloudera-scm-agent/process/457-yarn-JOBHISTORYO/run/cloudera-scm-agent/process/457-yarn-JOBHISTORYB/run/cloudera-scm-agent/proc
ess/457-yarn-JOBHISTORYH/run/cloudera-scm-agent/process/457-yarn-JOBHISTORYI/run/cloudera-scm-agent/process/457-yarn-JOBHISTORYS/run/cloudera-scm-agent/process/457-yarn-JOBHISTORYT/run/cloudera-scm-agent/process/457-yarn-JOBHISTORYO/run/

 

Any idea what's causing this and how to fix it ?

 

Thank you.

Daniel

Explorer
Posts: 11
Registered: ‎01-13-2015

Re: /run/cloudera-scm-agent/process fills up with invalid data

"Fixed" it by completely removing the whole installation (manager included) and reinstalling it.
New Contributor
Posts: 1
Registered: ‎07-18-2016

Re: /run/cloudera-scm-agent/process fills up with invalid data

Has anyone else run into the same issue and have any solution, other than a complete reinstall?  I have created a 5-node cluster with CDH 5.7.  Everything looked OK, until I enabled Kerberos with Active Directory.  Then I similar symptoms.  All directories under /run/cloudera-scm-agent/process on Node 1 fill up with similar lines during cluster startup, and cluster wouldn't come up.  Is this a CDH 5.7 bug or installation not done right?  Have installed CDH (previous versions) more than 10 times, but never seen this behavior...

 

[root@srv1 process]# df -h

Filesystem                    Size  Used Avail Use% Mounted on

cm_processes             48G   48G  3.5M 100% /run/cloudera-scm-agent/process

….

 

[root@srv1 process]# du -sk * |sort -rh

16091608            790-hbase-MASTER

13665656            801-spark_on_yarn-SPARK_YARN_HISTORY_SERVER

4488088              800-ks_indexer-HBASE_INDEXER

4089940              799-yarn-RESOURCEMANAGER

2633832              802-hive-HIVEMETASTORE

2609044              794-yarn-JOBHISTORY

2567632              785-hdfs-NAMENODE

2040808              804-hive-HIVESERVER2

772820                 805-oozie-OOZIE_SERVER

259008                 806-impala-CATALOGSERVER

1064                     809-impala-STATESTORE

916                       730-cloudera-mgmt-SERVICEMONITOR

896                       812-HBaseShutdown

896                       734-cloudera-mgmt-HOSTMONITOR

788                       793-solr-SOLR_SERVER

172                       732-cloudera-mgmt-REPORTSMANAGER

148                       735-cloudera-mgmt-EVENTSERVER

128                       731-cloudera-mgmt-ALERTPUBLISHER

52                          778-zookeeper-server

 

[root@srv1 process]# ls -l 790-hbase-MASTER

 

-rw-r----- 1 hbase hadoop    65642496 Jul 15 17:52 cloudera-monitor.properties

-rw-r----- 1 hbase hadoop   267481088 Jul 15 17:52 cloudera-stack-monitor.properties

-rw-r----- 1 hbase hadoop 14173913088 Jul 15 17:52 core-site.xml

-rw-r----- 1 hbase hadoop    11558912 Jul 15 17:52 event-filter-rules.json

-rw-r----- 1 hbase hadoop           0 Jul 15 17:52 hadoop-metrics2.properties

-rw------- 1 hbase hbase           88 Jul 15 17:50 hbase.keytab

-rw-r----- 1 hbase hadoop   209887232 Jul 15 17:52 hbase-site.xml

-rw-r----- 1 hbase hadoop    10059776 Jul 15 17:52 hdfs-site.xml

-rw-r----- 1 hbase hadoop  1158724994 Jul 15 17:52 jaas.conf

-rw------- 1 hbase hadoop        1445 Jul 15 17:52 krb5cc_10120

-rw-r----- 1 hbase hadoop   109789184 Jul 15 17:52 log4j.properties

drwxr-x--x 2 hbase hbase          100 Jul 15 17:50 logs

-rw-r----- 1 hbase hadoop           0 Jul 15 17:50 navigator.client.properties

-rw-r--r-- 1 root  root            13 Jul 16 17:58 process_timestamp

-rw-r----- 1 hbase hadoop           0 Jul 15 17:52 redaction-rules.json

-rw-r----- 1 hbase hadoop     3776512 Jul 15 17:52 ssl-client.xml

-rw-r----- 1 hbase hadoop   466878464 Jul 15 17:52 ssl-server.xml

-rw------- 1 root  root          3397 Jul 15 17:50 supervisor.conf

 

 

[root@srv1 process]# ls -l 801-spark_on_yarn-SPARK_YARN_HISTORY_SERVER

 

drwxr-x--x 3 spark spark         60 Jul 15 17:51 aux

-rw-r----- 1 spark hadoop 118968262 Jul 15 17:51 cloudera-monitor.properties

-rw-r----- 1 spark hadoop 154964295 Jul 15 17:51 cloudera-stack-monitor.properties

-rw-r----- 1 spark hadoop 374891261 Jul 15 17:51 log4j.properties

drwxr-x--x 2 spark spark        140 Jul 15 17:52 logs

-rw-r----- 1 spark hadoop         0 Jul 15 17:51 redaction-rules.json

drwxr-x--x 2 spark spark         80 Jul 15 17:51 scripts

drwxr-x--x 2 spark spark         60 Jul 15 17:51 spark-conf

-rw-r----- 1 spark hadoop 102726153 Jul 15 17:51 spark-history-server.conf

-rw------- 1 spark spark         88 Jul 15 17:51 spark_on_yarn.keytab

-rw------- 1 root  root        3420 Jul 15 17:51 supervisor.conf

drwxr-x--x 2 spark spark        220 Jul 15 17:52 yarn-conf

 

 

All these “big size” files do not have the relevant info, but it has info like this.  I mean all the files have almost the same info.  Since the  directory fills up in the middle, I can’t start the cluster completely.

 

[root@srv1 790-hbase-MASTER]# less cloudera-monitor.properties

/run/cloudera-scm-agent/process/790-hbase-MASTER//run/cloudera-scm-agent/process/790-hbase-MASTERr/run/cloudera-scm-agent/process/790-hbase-MASTERu/run/cloudera-scm-agent/process/790-hbase-MASTERn/run/cloudera-scm-agent/process/790-hbase-MASTER//run/cloudera-scm-agent/process/790-hbase-MASTERc/run/cloudera-scm-agent/process/790-hbase-MASTERl/ru

….

 

[root@srv1 790-hbase-MASTER]# less core-site.xml

/run/cloudera-scm-agent/process/790-hbase-MASTER//run/cloudera-scm-agent/process/790-hbase-MASTERr/run/cloudera-scm-agent/process/790-hbase-MASTERu/run/cloudera-scm-agent/process/790-hbase-MASTERn/run/cloudera-scm-agent/process/790-hbase-MASTER//run/cloudera-scm-agent/process/790-hbase-MASTERc/run/cloudera-scm-agent/process/790-hbase-MASTERl/run/cloudera-scm-agent/process/790-hbase-MASTERo/run/cloudera-scm-agent/process/790-hbase-MASTERu/run/cloudera-scm-agent/process/790-hbase-MASTERd/run/

 

Shankar

Explorer
Posts: 11
Registered: ‎01-13-2015

Re: /run/cloudera-scm-agent/process fills up with invalid data

 did you find anything ?

New Contributor
Posts: 3
Registered: ‎11-12-2015

Re: /run/cloudera-scm-agent/process fills up with invalid data

I have the same issue. It happened after cluster upgrade to 5.8 and the upgrade process failed on the "install oozie sharelib" step. For the this file

 

/run/cloudera-scm-agent/process/NNN-dfs-create-dir/core-site.xml

 

gets filled with invalid data.

Cloudera Employee
Posts: 6
Registered: ‎08-16-2016

Re: /run/cloudera-scm-agent/process fills up with invalid data

Hi,

 

  Can we get some more information about your deployment? For starters, what is the host os of the machines in your cluster?

 

-Niranjan

Software Engineer, CM

Cloudera Employee
Posts: 2
Registered: ‎09-29-2016

Re: /run/cloudera-scm-agent/process fills up with invalid data

Is it possible to get the stdout/stderr from the process?

They will be in the logs sub-directory for one of the processes with the problem.

 

thanks

Highlighted
New Contributor
Posts: 4
Registered: ‎12-28-2017

Re: /run/cloudera-scm-agent/process fills up with invalid data

Hi Daniel,

 

I faced the same issue, hope it helps some one else.

Following steps are the work around.

 

  1. Stop agent services
    • service cloudera-scm-agent stop
  2. kill all the processes obtained from the following command using ‘kill -9’
    • lsof +d /var/run/cloudera-scm-agent/process
  3. Remove cloudera agent
    • yum remove cloudera-manager-agent-5.6.0-1.cm560.p0.54.el6.x86_64
  4. umount /var/run/cloudera-scm-agent/process
  5. Reinstall cloudera agent
    • yum install cloudera-manager-agent-5.6.0-1.cm560.p0.54.el6.x86_64
  6. Change the hostname in config.ini file
    • cd /etc/cloudera-scm-agent
    • change server_host from localhost to <cloudera server ip address>
  7. Start agent services
    • service cloudera-scm-agent restart_clean_confirmed

 

Regards

Posts: 941
Topics: 1
Kudos: 218
Solutions: 117
Registered: ‎04-22-2014

Re: /run/cloudera-scm-agent/process fills up with invalid data

@sippy@daniel.haviv,

 

The type of situation you are encountering is quite rare, but I've seen and fixed it before.

When you start a command, the agent will

 

- heartbeat to Cloudera Manager and learn that it needs to start something essentially.

- It will retrieve the necessary files to start the process and then put them in a newly-created process directory.

- After they are laid down, the agent triggers the supervisor to let it know to recheck what processes should be running.

- supervisor sees that and then starts the process if it is not started... it will use the supervisor.conf file in the process's directory to determine what script to run to start the process.

 

Part of that start script is a perl replacement that will replace certain place-holder values in the configuration files with "real" values. 

 

For example, if you are seeing the problem in hbase, the following script will be executed:
/usr/lib64/cmf/service/hbase/hbase.sh

 

In this file you will find:

 


# Search-replace {{CMF_CONF_DIR}} in files
replace_conf_dir

 

replace_conf_dir() is defined in /usr/lib64/cmf/service/common/cloudera-config.sh

 

If you are seeing the type of situation that you are seeing, then it is likely that these files got corrupted somehow and the regex replace is replacing on "null" with the value of the configuration directory (process directory).

 

For every non-character in your configuration file (between chracters), the configuration path will be added.

 

Long story short, the step that matters in your steps is reinstalling the agent.

Since the *.sh files are static and part of the rpm package install, they should not be different than what Cloudera distributes in its package.  Installing fresh will restore the correct *.sh files and their replace regexes.

 

If you do encounter this issue again, I would advise you to back up the *.sh files that are used to start your processes on that host and share them with us for analysis.

 

I know that was a lot, but I wanted to clarify for the community why this sort of thing happens and why reinstalling the agent is the right thing.

 

Thanks!

 

Ben

Cloudera Employee
Posts: 508
Registered: ‎07-30-2013

Re: /run/cloudera-scm-agent/process fills up with invalid data

It's also worth noting that CM made some defensive changes for this issue in newer versions, so using CM of at least one of these versions may help:
5.10+
5.9.1+
5.8.4+
5.7.5+

I'd be extremely curious if someone could provide consistent reproduction steps for this as well. It seems to happen quite randomly.
Announcements