Created on 07-30-2015 11:32 AM - edited 08-05-2015 12:15 PM
Hello, we have an issue with root parition getting filled whenever a datanode directory fails and umounted. Cloudera agent is creating the data dir's upon server power cycle or cloudera agent restart.
Our environment:
CDH 4.7.1
CM 4.8.5
Our current data dir setup:
dfs.datanode.data.dir = /hadoopX/data
mapred.local.dir = /hadoopX/local
When ever a drive /hadoopX fails and drive gets unmount it for repair, Cloudera agent creates a /hadoopX/data and /hadoopX/local directories on root partition. Due to running jobs, root partition(200 gb) get filled pretty soon and results in service(datanode, tasktracker) failures.
Is there a work around it ? How to stop cloudera agent to not create data dir's on root parition. I see that Ambari had a similar issue, and its fixed. Jira - https://issues.apache.org/jira/browse/AMBARI-7506 Please suggest any work around . Thank you. Appreciate your help.
-
Thanks
Vganji
Created 08-07-2015 01:37 AM
Created 08-07-2015 11:44 AM
Harsh, thank you for comments.
I tried changing permissions of /hadoopX to 700 and started datanode service on it, But cm-agent upon datanode service start, is creating /hadoopX/data & /hadoopX/local on root parition. Here is the log.
[07/Aug/2015 11:19:14 +0000] 14569 MainThread agent INFO Activating Process 608-hdfs-DATANODE
[07/Aug/2015 11:19:14 +0000] 14569 MainThread agent INFO Created /var/run/cloudera-scm-agent/process/608-hdfs-DATANODE
[07/Aug/2015 11:19:14 +0000] 14569 MainThread agent INFO Chowning /var/run/cloudera-scm-agent/process/608-hdfs-DATANODE to apps (513) apps (515)
[07/Aug/2015 11:19:14 +0000] 14569 MainThread agent INFO Chmod'ing /var/run/cloudera-scm-agent/process/608-hdfs-DATANODE to 0751
[07/Aug/2015 11:19:14 +0000] 14569 MainThread parcel INFO prepare_environment begin: {u'CDH': u'4.7.1-1.cdh4.7.1.p0.47'}, [u'cdh'], [u'cdh-plugin', u'hdfs-plugin']
[07/Aug/2015 11:19:14 +0000] 14569 MainThread parcel INFO The following requested parcels are not available: {}
[07/Aug/2015 11:19:14 +0000] 14569 MainThread parcel INFO Obtained tags ['cdh'] for parcel CDH
[07/Aug/2015 11:19:14 +0000] 14569 MainThread parcel INFO prepare_environment end: {'CDH': '4.7.1-1.cdh4.7.1.p0.47'}
[07/Aug/2015 11:19:14 +0000] 14569 MainThread util INFO Extracted 8 files and 0 dirs to /var/run/cloudera-scm-agent/process/608-hdfs-DATANODE.
[07/Aug/2015 11:19:14 +0000] 14569 MainThread agent INFO Created /var/run/cloudera-scm-agent/process/608-hdfs-DATANODE/logs
[07/Aug/2015 11:19:14 +0000] 14569 MainThread agent INFO Chowning /var/run/cloudera-scm-agent/process/608-hdfs-DATANODE/logs to apps (513) apps (515)
[07/Aug/2015 11:19:14 +0000] 14569 MainThread agent INFO Chmod'ing /var/run/cloudera-scm-agent/process/608-hdfs-DATANODE/logs to 0751
[07/Aug/2015 11:19:14 +0000] 14569 MainThread agent INFO Created /hadoop6/data
[07/Aug/2015 11:19:14 +0000] 14569 MainThread agent INFO Chowning /hadoop6/data to apps (513) hadoop (493)
[07/Aug/2015 11:19:14 +0000] 14569 MainThread agent INFO Chmod'ing /hadoop6/data to 0700
[07/Aug/2015 11:19:14 +0000] 14569 MainThread agent INFO Triggering supervisord update.
[07/Aug/2015 11:19:14 +0000] 14569 MainThread abstract_monitor INFO Refreshing DataNodeMonitor for None
And we mount disk using fstab only. Our main goal is not decomission services on node with disk failures <= dfs.datanode.failed.volumes.tolerated (2).
Could you help with some other workaournd of how to not allow cm-agent to create data dir's on root parition, may be adding a check in agent at $CMF_PATH/agent/src/cmf/agent.py ?
Created 08-08-2015 02:00 PM
show us the fstab please
Created on 08-08-2015 02:03 PM - edited 08-08-2015 02:04 PM
(nevermind my post just now regarding what X is... I found your paths are hadoop# not literally hadoopX)
Created 04-16-2018 07:52 AM
Hi @Harsh J,
I know I'm reviving an old thread, but can you please comment on the fact that this "fix" still does not work in CDH 5.12.1, managed by CM? Even if the folders are owned by root, with 700, the folders still get created, and data is being written to an underlying FS, often /, which is not really good, don't you agree?
Thanks,
Milan
Created 04-16-2018 06:53 PM
Created on 04-17-2018 07:14 AM - edited 04-17-2018 07:17 AM
Interesting, @Harsh J, we'll try that as well, and post back.
Edit: it seems that this workaround indeed works - thanks again.
Let's hope it gets patched soon - it seems relatively trivial to resolve, but I might be wrong. :)
Cheers