Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Quickstart on AWS EC2 : DnsResolutionMonitor throttling_logger ERROR

avatar
Explorer

Hello,

 

I have followed the instructions described here https://s3.amazonaws.com/quickstart-reference/cloudera/hadoop/latest/doc/Cloudera_EDH_on_AWS.pdf

for deploying EDH on AWS. Things seem fine all the way until boostrapping the cluster. I have the following cluster :

 

AMI : ami-30d9e02d

Cloudera Manager on d2.xlarge

2 masters (m4.2xlarge)

2 workers (d2.xlarge)

1 gateway (m4.2xlarge)

 

I used the AWS CloudFormation template and was able to connect to Cloudera Manager via the web console w/o problems. I deployed the cluster and all EC2 nodes are running with Status Checks ok (2/2), but the cluster fails at bootstrap.

I logged to one of the masters and I see the following at the bottom of /var/log/cloudera-scm-agent/cloudera-scm-agent.log :

 

 

[27/Jan/2016 03:16:44 +0000] 3069 MonitorDaemon-Reporter throttling_logger ERROR    Error sending messages to firehose: MGMT-HOSTMONITOR-244944378552b77b5c898d702d752f7f
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/src/cmf/monitor/firehose.py", line 116, in _send
    self._port)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 469, in __init__
    self.conn.connect()
  File "/usr/lib64/python2.6/httplib.py", line 720, in connect
    self.timeout)
  File "/usr/lib64/python2.6/socket.py", line 567, in create_connection
    raise error, msg
error: [Errno 111] Connection refused
[27/Jan/2016 03:19:46 +0000] 3069 DnsResolutionMonitor throttling_logger ERROR    Timeout with args ['/usr/java/jdk1.7.0_67-cloudera/bin/java', '-classpath', '/usr/share/cmf/lib/agent-5.5.1.jar', 'com.cloudera.cmon.agent.DnsTest']
None
[27/Jan/2016 03:19:46 +0000] 3069 DnsResolutionMonitor throttling_logger ERROR    Failed to run DnsTest.
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/src/cmf/monitor/host/dns_names.py", line 83, in collect_dns_metrics
    self._subprocess_with_timeout(args, self._poll_timeout)
  File "/usr/lib64/cmf/agent/src/cmf/monitor/host/dns_names.py", line 55, in _subprocess_with_timeout
    return subprocess_with_timeout(args, timeout)
  File "/usr/lib64/cmf/agent/src/cmf/subprocess_timeout.py", line 94, in subprocess_with_timeout
    raise Exception("timeout with args %s" % args)
Exception: timeout with args ['/usr/java/jdk1.7.0_67-cloudera/bin/java', '-classpath', '/usr/share/cmf/lib/agent-5.5.1.jar', 'com.cloudera.cmon.agent.DnsTest']

 

 

and in the same file for the gateway :

 

 

[27/Jan/2016 03:20:27 +0000] 3072 DnsResolutionMonitor throttling_logger ERROR    Timeout with args ['/usr/java/jdk1.7.0_67-cloudera/bin/java', '-classpath', '/usr/share/cmf/lib/agent-5.5.1.jar', 'com.cloudera.cmon.agent.DnsTest']
None
[27/Jan/2016 03:20:27 +0000] 3072 DnsResolutionMonitor throttling_logger ERROR    Failed to run DnsTest.
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/src/cmf/monitor/host/dns_names.py", line 83, in collect_dns_metrics
    self._subprocess_with_timeout(args, self._poll_timeout)
  File "/usr/lib64/cmf/agent/src/cmf/monitor/host/dns_names.py", line 55, in _subprocess_with_timeout
    return subprocess_with_timeout(args, timeout)
  File "/usr/lib64/cmf/agent/src/cmf/subprocess_timeout.py", line 94, in subprocess_with_timeout
    raise Exception("timeout with args %s" % args)
Exception: timeout with args ['/usr/java/jdk1.7.0_67-cloudera/bin/java', '-classpath', '/usr/share/cmf/lib/agent-5.5.1.jar', 'com.cloudera.cmon.agent.DnsTest']

 

 

and the same log file in the worker node has the following :

 

[27/Jan/2016 03:08:52 +0000] 3012 MainThread agent        ERROR    Failed to connect to previous supervisor.
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/src/cmf/agent.py", line 1635, in find_or_start_supervisor
    self.configure_supervisor_clients()
  File "/usr/lib64/cmf/agent/src/cmf/agent.py", line 1882, in configure_supervisor_clients
    supervisor_options.realize(args=["-c", os.path.join(self.supervisor_dir, "supervisord.conf")])
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 1564, in realize
    Options.realize(self, *arg, **kw)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 311, in realize
    self.process_config()
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 319, in process_config
    self.process_config_file(do_usage)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 354, in process_config_file
    self.usage(str(msg))
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 142, in usage
    self.exit(2)
SystemExit: 2

I'm pretty new to Cloudera and AWS so any insight is appreciated!

1 ACCEPTED SOLUTION

avatar
Master Collaborator

 

> This indicates that agent is unable to connect to HOSTMONITOR, is HMON running? 

[27/Jan/2016 03:16:44 +0000] 3069 MonitorDaemon-Reporter throttling_logger ERROR    Error sending messages to firehose: MGMT-HOSTMONITOR-244944378552b77b5c898d702d752f7f

 

> This indicates that the command timed out when attempting to run a DNS test [1]

[27/Jan/2016 03:19:46 +0000] 3069 DnsResolutionMonitor throttling_logger ERROR    Timeout with args ['/usr/java/jdk1.7.0_67-cloudera/bin/java', '-classpath', '/usr/share/cmf/lib/agent-5.5.1.jar', 'com.cloudera.cmon.agent.DnsTest']

 

Example: 
[bash]# /usr/java/jdk1.7.0_67-cloudera/bin/java -classpath /usr/share/cmf/lib/agent-5.5.1.jar com.cloudera.cmon.agent.DnsTest

 

> This indicates that supervisord [2] was not running at that time

[27/Jan/2016 03:08:52 +0000] 3012 MainThread agent        ERROR    Failed to connect to previous supervisor.

 

[1] http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_networknames_configure.html

[2] "Agent Process Supervision in Detail" http://blog.cloudera.com/blog/2013/07/how-does-cloudera-manager-work/

View solution in original post

2 REPLIES 2

avatar
Explorer

I also tried deploying the cluster using the command line (cloudera-director) and using the aws.reference.conf file, the job stops with the following output on the terminal :

 

* Creating Sentry Database ... done
* Waiting for firstRun on cluster C5-Reference-AWS ... done
* Cloudera Manager 'First Run' command execution failed: Failed to perform First Run of services. ...

then I tried to check the status :

 

[ec2-user@ip-10-0-2-205 setup-default]$ cloudera-director status aws.reference.conf
Process logs can be found at /home/ec2-user/.cloudera-director/logs/application.log
Plugins will be loaded from /var/lib/cloudera-director-plugins
Cloudera Director 2.0.0 initializing ...
Unexpected internal error (see logs): Cluster C5-Reference-AWS is in stage BOOTSTRAP_FAILED. See cluster status and server logs for details.

Any help is appreciated!

avatar
Master Collaborator

 

> This indicates that agent is unable to connect to HOSTMONITOR, is HMON running? 

[27/Jan/2016 03:16:44 +0000] 3069 MonitorDaemon-Reporter throttling_logger ERROR    Error sending messages to firehose: MGMT-HOSTMONITOR-244944378552b77b5c898d702d752f7f

 

> This indicates that the command timed out when attempting to run a DNS test [1]

[27/Jan/2016 03:19:46 +0000] 3069 DnsResolutionMonitor throttling_logger ERROR    Timeout with args ['/usr/java/jdk1.7.0_67-cloudera/bin/java', '-classpath', '/usr/share/cmf/lib/agent-5.5.1.jar', 'com.cloudera.cmon.agent.DnsTest']

 

Example: 
[bash]# /usr/java/jdk1.7.0_67-cloudera/bin/java -classpath /usr/share/cmf/lib/agent-5.5.1.jar com.cloudera.cmon.agent.DnsTest

 

> This indicates that supervisord [2] was not running at that time

[27/Jan/2016 03:08:52 +0000] 3012 MainThread agent        ERROR    Failed to connect to previous supervisor.

 

[1] http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_networknames_configure.html

[2] "Agent Process Supervision in Detail" http://blog.cloudera.com/blog/2013/07/how-does-cloudera-manager-work/