Member since
05-25-2017
3
Posts
2
Kudos Received
0
Solutions
05-26-2017
09:58 PM
1 Kudo
Found a way to bypass the user permission check: modify the file "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.11.0-py2.7.egg/cmf/process.py", at line 422-423, you can see: user = self.raw["user"] group = self.raw["group"] add the following two lines after group: user = <uesr_of_the_single_user_mode> if user == "hdfs" else user group = <group_of_the_single_user> if group == "hdfs" else group then backup the process.pyc and process.pyo in the same path, remove them. restart the cloudera-scm-agent, then enable HA, it succeed. The share edits on journal node was formatted correctly. Then recover the process.py and restart the agent. But still want to know why it use "hdfs" to execute the initialization here. Thanks, MH
... View more
05-25-2017
11:29 PM
1 Kudo
Hi, I'm installing CDH 5.11 with Cloudera Manager on Cento 7.2, I'm using single user mode with the defualt account cloudera-scm. Everything is OK but failed when I enabled HA. My clusters is like this: roc-master: name node roc-secondary: secondary roc-5, roc-s1, roc-s2: journal node roc-[1-6]: data node when I enabled HA, I selected roc-secondary as the other name node. the error message is as follows on the enable HA page: Failed to initialize Shared Edits Directory of NameNode NameNode (roc-master). Initialization can fail if the Shared Edits Directory is not empty. Check the stderr log for details.: Error found before invoking supervisord: Non-root agent cannot execute process as user 'hdfs'. Then the progress hang at start the name node with the following error message: 2017-05-26 02:27:25,996 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data/d1/dfs/nn/in_use.lock acquired by nodename 20238@roc-master 2017-05-26 02:27:26,607 WARN org.apache.hadoop.hdfs.server.namenode.FSEditLog: Unable to determine input streams from QJM to [192.168.0.47:8485, 192.168.0.69:8485, 192.168.0.51:8485]. Skipping. org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 2/3. 3 exceptions thrown: 192.168.0.47:8485: Journal Storage Directory /data/d1/dfs/jn/scorpio not formatted at org.apache.hadoop.hdfs.qjournal.server.Journal.checkFormatted(Journal.java:472) at org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:655) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:186) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:236) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25431) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2220) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2214) 192.168.0.69:8485: Journal Storage Directory /data/d1/dfs/jn/scorpio not formatted at org.apache.hadoop.hdfs.qjournal.server.Journal.checkFormatted(Journal.java:472) at org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:655) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:186) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:236) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25431) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2220) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2214) the cloudera-scm-agent.log, the log is as follows: [26/May/2017 02:27:08 +0000] 17557 MainThread process INFO Deactivating process 1764-hdfs-NAMENODE-format [26/May/2017 02:27:08 +0000] 17557 MainThread util INFO Using generic audit plugin for process namenode-initialize-shared-edits [26/May/2017 02:27:08 +0000] 17557 MainThread util INFO Creating metadata plugin for process namenode-initialize-shared-edits [26/May/2017 02:27:08 +0000] 17557 MainThread util INFO Using specific metadata plugin for process namenode-initialize-shared-edits [26/May/2017 02:27:08 +0000] 17557 MainThread util INFO Using generic metadata plugin for process namenode-initialize-shared-edits [26/May/2017 02:27:08 +0000] 17557 MainThread agent INFO [1765-namenode-initialize-shared-edits] Instantiating process [26/May/2017 02:27:08 +0000] 17557 MainThread process INFO [1765-namenode-initialize-shared-edits] Updating process: True {} [26/May/2017 02:27:08 +0000] 17557 MainThread agent ERROR Failed to activate {u'refresh_files': [], u'config_generation': 0,u'auto_restart': False, u'running': True, u'required_tags': [u'cdh'], u'one_off': True, u'special_file_info': [], u'group': u'hdfs', u'id': 1765, u'status_links': {}, u'name': u'namenode-initialize-shared-edits', u'extra_groups': [], u'run_generation': 1, u'start_timeout_seconds': None, u'environment': {u'HADOOP_AUDIT_LOGGER': u'INFO,RFAAUDIT', u'CM_ADD_TO_CP_DIRS': u'navigator/cdh57', u'HADOOP_NAMENODE_OPTS':u'-Xms4294967296 -Xmx4294967296 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/hdfs_hdfs-NAMENODE-0f2bd0a3cf1534ca0c41e5c9cb5266fa_pid{{PID}}.hprof -XX:OnOutOfMemoryError={{AGENT_COMMON_DIR}}/killparent.sh', u'HADOOP_SECURITY_LOGGER': u'INFO,RFAS', u'HADOOP_CREDSTORE_PASSWORD': u'ifzbuyq7pv4key60lzjxjszy', u'HADOOP_LOG_DIR': u'/var/log/hadoop-hdfs', u'HADOOP_ROOT_LOGGER': u'INFO,console', u'HADOOP_LOGFILE': u'hadoop-cmf-hdfs-NAMENODE-roc-master.log.out', u'CDH_VERSION': u'5'}, u'optional_tags': [u'cdh-plugin', u'hdfs-plugin'], u'program': u'hdfs/hdfs.sh', u'arguments': [u'initializeSharedEdits'], u'parcels': {u'SPARK2': u'2.1.0.cloudera1-1.cdh5.7.0.p0.120904', u'CDH': u'5.11.0-1.cdh5.11.0.p0.34', u'ACCUMULO': u'1.7.2-5.5.0.ACCUMULO5.5.0.p0.8', u'KAFKA': u'2.1.1-1.2.1.1.p0.18'}, u'resources': [], u'user': u'hdfs'} Traceback (most recent call last): File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.11.0-py2.7.egg/cmf/agent.py", line 1702, in handle_heartbeat_p rocesses new_process.update_heartbeat(raw, True) File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.11.0-py2.7.egg/cmf/process.py", line 304, in update_heartbeat self.fs_update() File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.11.0-py2.7.egg/cmf/process.py", line 426, in fs_update raise Exception("Non-root agent cannot execute process as user '%s'" % user) Exception: Non-root agent cannot execute process as user 'hdfs' It seems like an user permission problem, so I checked the agent process: clouder+ 17557 1 1 02:20 ? 00:07:14 python2.7 /usr/lib64/cmf/agent/build/env/bin/cmf-agent --package_dir /usr/lib64/cmf/service --agent_dir /var/run/cloudera-scm-agent --lib_dir /var/lib/cloudera-scm-agent --logfile /var/log/cloudera-scm-agent/cloudera-scm-agent.log --daemon --comm_name cmf-agent --pidfile /var/run/cloudera-scm-agent/cloudera-scm-agent.pid the agent is running as the user cloudera-scm, and in the /etc/sudoers I've specified the following line: %cloudera-scm ALL=(ALL) NOPASSWD: ALL But the log shows when initial the shared edits, it was doing with another user 'hdfs'. I checked the code in /usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.11.0-py2.7.egg/cmf/process.py, seems like it checks the agent's user (here is cloudera-scm) and the user who does the shared edits initialization and found cloudera-scm is neither 'hdfs' nor 'root', so it raised the exception "Exception: Non-root agent cannot execute process as user 'hdfs'". I also checked the shared edits folder /data/d1/dfs/jn, indeed it is empty. I'm not sure that, I'm using the singer user mode (cloudera-scm), why it changes to 'hdfs' to executes the initialization? How to fix the problem to enable HA in my case? Thanks, MH
... View more
Labels:
- Labels:
-
Cloudera Manager
-
HDFS