Support Questions

Find answers, ask questions, and share your expertise

Quickstart on AWS EC2 : DnsResolutionMonitor throttling_logger ERROR

avatar
Explorer

Hello,

 

I have followed the instructions described here https://s3.amazonaws.com/quickstart-reference/cloudera/hadoop/latest/doc/Cloudera_EDH_on_AWS.pdf

for deploying EDH on AWS. Things seem fine all the way until boostrapping the cluster. I have the following cluster :

 

AMI : ami-30d9e02d

Cloudera Manager on d2.xlarge

2 masters (m4.2xlarge)

2 workers (d2.xlarge)

1 gateway (m4.2xlarge)

 

I used the AWS CloudFormation template and was able to connect to Cloudera Manager via the web console w/o problems. I deployed the cluster and all EC2 nodes are running with Status Checks ok (2/2), but the cluster fails at bootstrap.

I logged to one of the masters and I see the following at the bottom of /var/log/cloudera-scm-agent/cloudera-scm-agent.log :

 

 

[27/Jan/2016 03:16:44 +0000] 3069 MonitorDaemon-Reporter throttling_logger ERROR    Error sending messages to firehose: MGMT-HOSTMONITOR-244944378552b77b5c898d702d752f7f
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/src/cmf/monitor/firehose.py", line 116, in _send
    self._port)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/avro-1.6.3-py2.6.egg/avro/ipc.py", line 469, in __init__
    self.conn.connect()
  File "/usr/lib64/python2.6/httplib.py", line 720, in connect
    self.timeout)
  File "/usr/lib64/python2.6/socket.py", line 567, in create_connection
    raise error, msg
error: [Errno 111] Connection refused
[27/Jan/2016 03:19:46 +0000] 3069 DnsResolutionMonitor throttling_logger ERROR    Timeout with args ['/usr/java/jdk1.7.0_67-cloudera/bin/java', '-classpath', '/usr/share/cmf/lib/agent-5.5.1.jar', 'com.cloudera.cmon.agent.DnsTest']
None
[27/Jan/2016 03:19:46 +0000] 3069 DnsResolutionMonitor throttling_logger ERROR    Failed to run DnsTest.
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/src/cmf/monitor/host/dns_names.py", line 83, in collect_dns_metrics
    self._subprocess_with_timeout(args, self._poll_timeout)
  File "/usr/lib64/cmf/agent/src/cmf/monitor/host/dns_names.py", line 55, in _subprocess_with_timeout
    return subprocess_with_timeout(args, timeout)
  File "/usr/lib64/cmf/agent/src/cmf/subprocess_timeout.py", line 94, in subprocess_with_timeout
    raise Exception("timeout with args %s" % args)
Exception: timeout with args ['/usr/java/jdk1.7.0_67-cloudera/bin/java', '-classpath', '/usr/share/cmf/lib/agent-5.5.1.jar', 'com.cloudera.cmon.agent.DnsTest']

 

 

and in the same file for the gateway :

 

 

[27/Jan/2016 03:20:27 +0000] 3072 DnsResolutionMonitor throttling_logger ERROR    Timeout with args ['/usr/java/jdk1.7.0_67-cloudera/bin/java', '-classpath', '/usr/share/cmf/lib/agent-5.5.1.jar', 'com.cloudera.cmon.agent.DnsTest']
None
[27/Jan/2016 03:20:27 +0000] 3072 DnsResolutionMonitor throttling_logger ERROR    Failed to run DnsTest.
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/src/cmf/monitor/host/dns_names.py", line 83, in collect_dns_metrics
    self._subprocess_with_timeout(args, self._poll_timeout)
  File "/usr/lib64/cmf/agent/src/cmf/monitor/host/dns_names.py", line 55, in _subprocess_with_timeout
    return subprocess_with_timeout(args, timeout)
  File "/usr/lib64/cmf/agent/src/cmf/subprocess_timeout.py", line 94, in subprocess_with_timeout
    raise Exception("timeout with args %s" % args)
Exception: timeout with args ['/usr/java/jdk1.7.0_67-cloudera/bin/java', '-classpath', '/usr/share/cmf/lib/agent-5.5.1.jar', 'com.cloudera.cmon.agent.DnsTest']

 

 

and the same log file in the worker node has the following :

 

[27/Jan/2016 03:08:52 +0000] 3012 MainThread agent        ERROR    Failed to connect to previous supervisor.
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/src/cmf/agent.py", line 1635, in find_or_start_supervisor
    self.configure_supervisor_clients()
  File "/usr/lib64/cmf/agent/src/cmf/agent.py", line 1882, in configure_supervisor_clients
    supervisor_options.realize(args=["-c", os.path.join(self.supervisor_dir, "supervisord.conf")])
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 1564, in realize
    Options.realize(self, *arg, **kw)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 311, in realize
    self.process_config()
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 319, in process_config
    self.process_config_file(do_usage)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 354, in process_config_file
    self.usage(str(msg))
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 142, in usage
    self.exit(2)
SystemExit: 2

I'm pretty new to Cloudera and AWS so any insight is appreciated!

1 ACCEPTED SOLUTION

avatar
Master Collaborator

 

> This indicates that agent is unable to connect to HOSTMONITOR, is HMON running? 

[27/Jan/2016 03:16:44 +0000] 3069 MonitorDaemon-Reporter throttling_logger ERROR    Error sending messages to firehose: MGMT-HOSTMONITOR-244944378552b77b5c898d702d752f7f

 

> This indicates that the command timed out when attempting to run a DNS test [1]

[27/Jan/2016 03:19:46 +0000] 3069 DnsResolutionMonitor throttling_logger ERROR    Timeout with args ['/usr/java/jdk1.7.0_67-cloudera/bin/java', '-classpath', '/usr/share/cmf/lib/agent-5.5.1.jar', 'com.cloudera.cmon.agent.DnsTest']

 

Example: 
[bash]# /usr/java/jdk1.7.0_67-cloudera/bin/java -classpath /usr/share/cmf/lib/agent-5.5.1.jar com.cloudera.cmon.agent.DnsTest

 

> This indicates that supervisord [2] was not running at that time

[27/Jan/2016 03:08:52 +0000] 3012 MainThread agent        ERROR    Failed to connect to previous supervisor.

 

[1] http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_networknames_configure.html

[2] "Agent Process Supervision in Detail" http://blog.cloudera.com/blog/2013/07/how-does-cloudera-manager-work/

View solution in original post

2 REPLIES 2

avatar
Explorer

I also tried deploying the cluster using the command line (cloudera-director) and using the aws.reference.conf file, the job stops with the following output on the terminal :

 

* Creating Sentry Database ... done
* Waiting for firstRun on cluster C5-Reference-AWS ... done
* Cloudera Manager 'First Run' command execution failed: Failed to perform First Run of services. ...

then I tried to check the status :

 

[ec2-user@ip-10-0-2-205 setup-default]$ cloudera-director status aws.reference.conf
Process logs can be found at /home/ec2-user/.cloudera-director/logs/application.log
Plugins will be loaded from /var/lib/cloudera-director-plugins
Cloudera Director 2.0.0 initializing ...
Unexpected internal error (see logs): Cluster C5-Reference-AWS is in stage BOOTSTRAP_FAILED. See cluster status and server logs for details.

Any help is appreciated!

avatar
Master Collaborator

 

> This indicates that agent is unable to connect to HOSTMONITOR, is HMON running? 

[27/Jan/2016 03:16:44 +0000] 3069 MonitorDaemon-Reporter throttling_logger ERROR    Error sending messages to firehose: MGMT-HOSTMONITOR-244944378552b77b5c898d702d752f7f

 

> This indicates that the command timed out when attempting to run a DNS test [1]

[27/Jan/2016 03:19:46 +0000] 3069 DnsResolutionMonitor throttling_logger ERROR    Timeout with args ['/usr/java/jdk1.7.0_67-cloudera/bin/java', '-classpath', '/usr/share/cmf/lib/agent-5.5.1.jar', 'com.cloudera.cmon.agent.DnsTest']

 

Example: 
[bash]# /usr/java/jdk1.7.0_67-cloudera/bin/java -classpath /usr/share/cmf/lib/agent-5.5.1.jar com.cloudera.cmon.agent.DnsTest

 

> This indicates that supervisord [2] was not running at that time

[27/Jan/2016 03:08:52 +0000] 3012 MainThread agent        ERROR    Failed to connect to previous supervisor.

 

[1] http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_networknames_configure.html

[2] "Agent Process Supervision in Detail" http://blog.cloudera.com/blog/2013/07/how-does-cloudera-manager-work/