Support Questions

Find answers, ask questions, and share your expertise

cloudera manager hung on package distribution

avatar

Installing kafka, flume on two m4.2 instances and bootstrap seems to have hung here:


[2016-02-01 12:30:47] INFO  [pipeline-thread-42] - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (CDH, 5.5.1-1.cdh5.5.1.p0.11) stage DISTRIBUTED. Current: DOWNLOADED Past: [DOWNLOADED, DOWNLOADED, DOWNLOADED]. State ApiParcelState{progress=100, progressTotal=100, count=1, countTotal=1, warnings=null, errors=null}
1 ACCEPTED SOLUTION

avatar
Master Collaborator

It's likely that this is a transient error due to a network connectivity issue. Can you please retry? Do you get this failure in a consistent way? 

View solution in original post

3 REPLIES 3

avatar

Here is the tail of one of the application nodes:

[01/Feb/2016 12:11:36 +0000] 3034 MainThread agent        ERROR    Failed to connect to previous supervisor.
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/src/cmf/agent.py", line 1635, in find_or_start_supervisor
    self.configure_supervisor_clients()
  File "/usr/lib64/cmf/agent/src/cmf/agent.py", line 1882, in configure_supervisor_clients
    supervisor_options.realize(args=["-c", os.path.join(self.supervisor_dir, "supervisord.conf")])
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 1564, in realize
    Options.realize(self, *arg, **kw)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 311, in realize
    self.process_config()
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 319, in process_config
    self.process_config_file(do_usage)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 354, in process_config_file
    self.usage(str(msg))
  File "/usr/lib64/cmf/agent/build/env/lib/python2.6/site-packages/supervisor-3.0-py2.6.egg/supervisor/options.py", line 142, in usage
    self.exit(2)
SystemExit: 2
[01/Feb/2016 12:11:36 +0000] 3034 MainThread tmpfs        INFO     Successfully mounted tmpfs at /var/run/cloudera-scm-agent/process
[01/Feb/2016 12:11:39 +0000] 3034 MainThread agent        INFO     Trying to connect to newly launched supervisor (Attempt 1)
[01/Feb/2016 12:11:40 +0000] 3034 MainThread agent        INFO     Supervisor version: 3.0
[01/Feb/2016 12:11:40 +0000] 3034 MainThread agent        INFO     Successfully connected to supervisor
[01/Feb/2016 12:11:40 +0000] 3034 MainThread status_server INFO     Using maximum impala profile bundle size of 1073741824 bytes.
[01/Feb/2016 12:11:40 +0000] 3034 MainThread status_server INFO     Using maximum stacks log bundle size of 1073741824 bytes.
[01/Feb/2016 12:11:40 +0000] 3034 MainThread _cplogging   INFO     [01/Feb/2016:12:11:40] ENGINE Bus STARTING
[01/Feb/2016 12:11:40 +0000] 3034 MainThread _cplogging   INFO     [01/Feb/2016:12:11:40] ENGINE Started monitor thread '_TimeoutMonitor'.
[01/Feb/2016 12:11:41 +0000] 3034 MainThread _cplogging   INFO     [01/Feb/2016:12:11:41] ENGINE Serving on ip-172-31-39-199.us-west-2.compute.internal:9000
[01/Feb/2016 12:11:41 +0000] 3034 MainThread _cplogging   INFO     [01/Feb/2016:12:11:41] ENGINE Bus STARTED
[01/Feb/2016 12:11:41 +0000] 3034 MainThread __init__     INFO     New monitor: (<cmf.monitor.host.HostMonitor object at 0x34ab1d0>,)
[01/Feb/2016 12:11:41 +0000] 3034 MonitorDaemon-Scheduler __init__     INFO     Monitor ready to report: ('HostMonitor',)
[01/Feb/2016 12:11:41 +0000] 3034 MainThread agent        INFO     Setting default socket timeout to 30
[01/Feb/2016 12:11:41 +0000] 3034 MainThread agent        INFO     Using parcels directory from server provided value: /opt/cloudera/parcels
[01/Feb/2016 12:11:41 +0000] 3034 MainThread agent        INFO     Created /opt/cloudera/parcels
[01/Feb/2016 12:11:41 +0000] 3034 MainThread agent        INFO     Chowning /opt/cloudera/parcels to root (0) root (0)
[01/Feb/2016 12:11:41 +0000] 3034 MainThread agent        INFO     Chmod'ing /opt/cloudera/parcels to 0755
[01/Feb/2016 12:11:41 +0000] 3034 MainThread agent        INFO     Created /opt/cloudera/parcel-cache
[01/Feb/2016 12:11:41 +0000] 3034 MainThread agent        INFO     Chowning /opt/cloudera/parcel-cache to root (0) root (0)
[01/Feb/2016 12:11:41 +0000] 3034 MainThread agent        INFO     Chmod'ing /opt/cloudera/parcel-cache to 0755
[01/Feb/2016 12:11:41 +0000] 3034 MainThread parcel       INFO     Agent does create users/groups and apply file permissions
[01/Feb/2016 12:11:41 +0000] 3034 MainThread downloader   INFO     Downloader path: /opt/cloudera/parcel-cache
[01/Feb/2016 12:11:41 +0000] 3034 MainThread parcel_cache INFO     Using /opt/cloudera/parcel-cache for parcel cache
[01/Feb/2016 12:11:42 +0000] 3034 MainThread firehoses    INFO     Reporting interval updated: 5.0 -> 60
[01/Feb/2016 12:11:42 +0000] 3034 MainThread agent        INFO     Active parcel list updated; recalculating component info.
[01/Feb/2016 12:11:56 +0000] 3034 CP Server Thread-4 _cplogging   INFO     172.31.47.109 - - [01/Feb/2016:12:11:56] "GET /heartbeat HTTP/1.1" 200 2 "" "NING/1.0"
[01/Feb/2016 12:12:11 +0000] 3034 DnsResolutionMonitor throttling_logger INFO     Using java location: '/usr/java/jdk1.7.0_67-cloudera/bin/java'.
[01/Feb/2016 12:12:27 +0000] 3034 CP Server Thread-5 _cplogging   INFO     172.31.47.109 - - [01/Feb/2016:12:12:27] "GET /heartbeat HTTP/1.1" 200 2 "" "NING/1.0"
[01/Feb/2016 12:12:27 +0000] 3034 Thread-13 downloader   INFO     Starting download of: http://ip-172-31-47-109.us-west-2.compute.internal:7180/cmf/parcel/download/KAFKA-0.8.2.0-1.kafka1.4.0.p0.56-el6.parcel
[01/Feb/2016 12:12:28 +0000] 3034 Thread-13 downloader   INFO     Completed download of http://ip-172-31-47-109.us-west-2.compute.internal:7180/cmf/parcel/download/KAFKA-0.8.2.0-1.kafka1.4.0.p0.56-el6.parcel code=200 state=downloaded
[01/Feb/2016 12:12:28 +0000] 3034 Thread-13 parcel_cache INFO     Checking checksum of parcel KAFKA-0.8.2.0-1.kafka1.4.0.p0.56-el6.parcel...
[01/Feb/2016 12:12:28 +0000] 3034 Thread-13 parcel_cache INFO     Unpacking /opt/cloudera/parcel-cache/KAFKA-0.8.2.0-1.kafka1.4.0.p0.56-el6.parcel into /opt/cloudera/parcels
[01/Feb/2016 12:12:29 +0000] 3034 Thread-13 parcel_cache INFO     Unpack of parcel /opt/cloudera/parcel-cache/KAFKA-0.8.2.0-1.kafka1.4.0.p0.56-el6.parcel successful
[01/Feb/2016 12:12:29 +0000] 3034 Thread-13 downloader   INFO     Finished download [ url: http://ip-172-31-47-109.us-west-2.compute.internal:7180/cmf/parcel/download/KAFKA-0.8.2.0-1.kafka1.4.0.p0.56-el6.parcel, state: complete, total_bytes: 37155105, downloaded_bytes: 37155105, start_time: 2016-02-01 12:12:27, download_end_time: 2016-02-01 12:12:28, end_time: 2016-02-01 12:12:29, code: 200, exception_msg: None, path: /opt/cloudera/parcel-cache/KAFKA-0.8.2.0-1.kafka1.4.0.p0.56-el6.parcel ]
[01/Feb/2016 12:12:42 +0000] 3034 MonitorDaemon-Reporter firehoses    INFO     Creating a connection to the SERVICEMONITOR.
[01/Feb/2016 12:12:42 +0000] 3034 MonitorDaemon-Reporter firehoses    INFO     Creating a connection to the HOSTMONITOR.
[01/Feb/2016 12:12:42 +0000] 3034 MainThread parcel       INFO     Loading parcel manifest for: KAFKA-0.8.2.0-1.kafka1.4.0.p0.56
[01/Feb/2016 12:12:43 +0000] 3034 MainThread parcel       INFO     Ensuring users/groups exist for new parcel KAFKA-0.8.2.0-1.kafka1.4.0.p0.56.
[01/Feb/2016 12:12:43 +0000] 3034 MainThread parcel       INFO     Executing command ['/usr/sbin/groupadd', '-r', 'kafka']
[01/Feb/2016 12:12:47 +0000] 3034 MainThread parcel       INFO     Executing command ['/usr/sbin/groupadd', '-r', 'kafka']
[01/Feb/2016 12:12:47 +0000] 3034 MainThread parcel       INFO     Executing command ['/usr/sbin/useradd', '-r', '-m', '-g', 'kafka', '-K', 'UMASK=022', '--home', '/var/lib/kafka', '--comment', 'Kafka', '--shell', '/sbin/nologin', 'kafka']
[01/Feb/2016 12:12:48 +0000] 3034 MainThread parcel       INFO     Ensuring correct file permissions for new parcel KAFKA-0.8.2.0-1.kafka1.4.0.p0.56.

avatar
Master Collaborator

It's likely that this is a transient error due to a network connectivity issue. Can you please retry? Do you get this failure in a consistent way? 

avatar

Thanks; our incoming CIDR's were incorrect.