Member since
01-25-2016
5
Posts
0
Kudos Received
0
Solutions
01-27-2016
02:12 AM
I also tried deploying the cluster using the command line (cloudera-director) and using the aws.reference.conf file, the job stops with the following output on the terminal : * Creating Sentry Database ... done
* Waiting for firstRun on cluster C5-Reference-AWS ... done
* Cloudera Manager 'First Run' command execution failed: Failed to perform First Run of services. ... then I tried to check the status : [ec2-user@ip-10-0-2-205 setup-default]$ cloudera-director status aws.reference.conf
Process logs can be found at /home/ec2-user/.cloudera-director/logs/application.log
Plugins will be loaded from /var/lib/cloudera-director-plugins
Cloudera Director 2.0.0 initializing ...
Unexpected internal error (see logs): Cluster C5-Reference-AWS is in stage BOOTSTRAP_FAILED. See cluster status and server logs for details. Any help is appreciated!
... View more
01-27-2016
12:47 AM
Hello, I have followed the instructions described here https://s3.amazonaws.com/quickstart-reference/cloudera/hadoop/latest/doc/Cloudera_EDH_on_AWS.pdf for deploying EDH on AWS. Things seem fine all the way until boostrapping the cluster. I have the following cluster : AMI : ami-30d9e02d Cloudera Manager on d2.xlarge 2 masters (m4.2xlarge) 2 workers (d2.xlarge) 1 gateway (m4.2xlarge) I used the AWS CloudFormation template and was able to connect to Cloudera Manager via the web console w/o problems. I deployed the cluster and all EC2 nodes are running with Status Checks ok (2/2), but the cluster fails at bootstrap. I logged to one of the masters and I see the following at the bottom of /var/log/cloudera-scm-agent/cloudera-scm-agent.log : [27/Jan/2016 03:16:44 +0000] 3069 MonitorDaemon-Reporter throttling_logger ERROR Error sending messages to firehose: MGMT-HOSTMONITOR-244944378552b77b5c898d702d752f7f
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/src/cmf/monitor/firehose.py", line 116, in _send
self._port)
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 469, in __init__
self.conn.connect()
File "/usr/lib64/python2.6/httplib.py", line 720, in connect
self.timeout)
File "/usr/lib64/python2.6/socket.py", line 567, in create_connection
raise error, msg
error: [Errno 111] Connection refused
[27/Jan/2016 03:19:46 +0000] 3069 DnsResolutionMonitor throttling_logger ERROR Timeout with args ['/usr/java/jdk1.7.0_67-cloudera/bin/java', '-classpath', '/usr/share/cmf/lib/agent-5.5.1.jar', 'com.cloudera.cmon.agent.DnsTest']
None
[27/Jan/2016 03:19:46 +0000] 3069 DnsResolutionMonitor throttling_logger ERROR Failed to run DnsTest.
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/src/cmf/monitor/host/dns_names.py", line 83, in collect_dns_metrics
self._subprocess_with_timeout(args, self._poll_timeout)
File "/usr/lib64/cmf/agent/src/cmf/monitor/host/dns_names.py", line 55, in _subprocess_with_timeout
return subprocess_with_timeout(args, timeout)
File "/usr/lib64/cmf/agent/src/cmf/subprocess_timeout.py", line 94, in subprocess_with_timeout
raise Exception("timeout with args %s" % args)
Exception: timeout with args ['/usr/java/jdk1.7.0_67-cloudera/bin/java', '-classpath', '/usr/share/cmf/lib/agent-5.5.1.jar', 'com.cloudera.cmon.agent.DnsTest'] and in the same file for the gateway : [27/Jan/2016 03:20:27 +0000] 3072 DnsResolutionMonitor throttling_logger ERROR Timeout with args ['/usr/java/jdk1.7.0_67-cloudera/bin/java', '-classpath', '/usr/share/cmf/lib/agent-5.5.1.jar', 'com.cloudera.cmon.agent.DnsTest']
None
[27/Jan/2016 03:20:27 +0000] 3072 DnsResolutionMonitor throttling_logger ERROR Failed to run DnsTest.
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/src/cmf/monitor/host/dns_names.py", line 83, in collect_dns_metrics
self._subprocess_with_timeout(args, self._poll_timeout)
File "/usr/lib64/cmf/agent/src/cmf/monitor/host/dns_names.py", line 55, in _subprocess_with_timeout
return subprocess_with_timeout(args, timeout)
File "/usr/lib64/cmf/agent/src/cmf/subprocess_timeout.py", line 94, in subprocess_with_timeout
raise Exception("timeout with args %s" % args)
Exception: timeout with args ['/usr/java/jdk1.7.0_67-cloudera/bin/java', '-classpath', '/usr/share/cmf/lib/agent-5.5.1.jar', 'com.cloudera.cmon.agent.DnsTest'] and the same log file in the worker node has the following : [27/Jan/2016 03:08:52 +0000] 3012 MainThread agent ERROR Failed to connect to previous supervisor.
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/src/cmf/agent.py", line 1635, in find_or_start_supervisor
self.configure_supervisor_clients()
File "/usr/lib64/cmf/agent/src/cmf/agent.py", line 1882, in configure_supervisor_clients
supervisor_options.realize(args=["-c", os.path.join(self.supervisor_dir, "supervisord.conf")])
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 1564, in realize
Options.realize(self, *arg, **kw)
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 311, in realize
self.process_config()
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 319, in process_config
self.process_config_file(do_usage)
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 354, in process_config_file
self.usage(str(msg))
File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 142, in usage
self.exit(2)
SystemExit: 2 I'm pretty new to Cloudera and AWS so any insight is appreciated!
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Cloudera Manager
-
Gateway