Created 12-12-2017 10:59 AM
I'm trying to install HDP 2.6 on 16 nodes (2 master nodes and 14 slaves) using Ambari wizard. After many challenges it is completed, but not enough successfully - all nodes get orange with "Warnings encountered". All slave nodes have only warning related to NodeManager Start
I'd highly appreciate any help on fixing this issue.
Here are some outputs and error messages from different nodes:
==== MasterNode1
--- Check HDFS
stdout: /var/lib/ambari-agent/data/output-189.txt (last notice)
2017-12-12 04:10:41,598 - HdfsResource[None] {'security_enabled': False, 'hadoop_bin_dir': '/usr/hdp/2.6.3.0-235/hadoop/bin', 'keytab': [EMPTY], 'dfs_type': '', 'default_fs': 'hdfs://nnode.cedar.cluster.ada:8020', 'hdfs_resource_ignore_file': '/var/lib/ambari-agent/data/.hdfs_resource_ignore', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': None, 'user': 'hdfs', 'action': ['execute'], 'hadoop_conf_dir': '/usr/hdp/2.6.3.0-235/hadoop/conf', 'immutable_paths': [u'/mr-history/done', u'/app-logs', u'/tmp']} Command completed successfully!
--- Grafana
Start Errors and Output files empty
==== MasterNode2
--- Metrics Collector Start
stderr: /var/lib/ambari-agent/data/errors-174.txt (last notice)
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 120, in action_create raise Fail("Applying %s failed, parent directory %s doesn't exist" % (self.resource, dirname)) resource_management.core.exceptions.Fail: Applying File['/usr/lib/ams-hbase/bin/hadoop'] failed, parent directory /usr/lib/ams-hbase/bin doesn't exist
--- Activity Analyzer
Start Errors and Output files empty
--- Activity Explorer Start
Errors and Output files empty
--- Check MapReduce2
Errors and Output files empty
==== All DataNodes - same warning on all (14) of them
--- NodeManager Start
stderr: /var/lib/ambari-agent/data/errors-181.txt Command aborted. Reason: 'Server considered task failed and automatically aborted it'
stdout: /var/lib/ambari-agent/data/output-181.txt (last notice)
2017-12-12 04:10:41,326 - Execute['ulimit -c unlimited; export HADOOP_LIBEXEC_DIR=/usr/hdp/2.6.3.0-235/hadoop/libexec && /usr/hdp/2.6.3.0-235/hadoop-yarn/sbin/yarn-daemon.sh --config /usr/hdp/2.6.3.0-235/hadoop/conf start nodemanager'] {'not_if': 'ambari-sudo.sh -H -E test -f /var/run/hadoop-yarn/yarn/yarn-yarn-nodemanager.pid && ambari-sudo.sh -H -E pgrep -F /var/run/hadoop-yarn/yarn/yarn-yarn-nodemanager.pid', 'user': 'yarn'}
Command aborted. Reason: 'Server considered task failed and automatically aborted it'
Command failed after 1 tries
Created 12-12-2017 11:38 AM
Can you manually start the node manager
su -l yarn -c "/usr/hdp/current/hadoop-yarn-nodemanager/sbin/yarn-daemon.sh start nodemanager"
Can you also check whether this directory exists
/usr/lib/ams-hbase/
Created 12-12-2017 11:59 AM
thank you for response
1) node manager is running
2) this folder does not exit not in master not in slaves
Do you think it will help if I create it
Additionally on Ambari UI -
Metrics Collector Process Connection failed: [Errno 111] Connection refused to....