Support Questions

Find answers, ask questions, and share your expertise

Cluster install with Director hangs on KAFKA parcel

avatar
Explorer

I am trying to deploy a cluster in AWS using Cloudera Director.   It appears to go smooth right up until the end with 579 / 597 steps completed.

 

GUI indicates:

"Distributing parcels: KAFKA-3.0.0-1.3.0.0.p0.40,CDH-5.15.0-1.cdh5.15.0.p0.21"

 

 

cloudera-director-server indicates over and over :

51.538 +0000] INFO  [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null}
[2018-06-21 14:50:53.552 +0000] INFO  [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null}
[2018-06-21 14:50:55.565 +0000] INFO  [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null}
[2018-06-21 14:50:57.579 +0000] INFO  [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null}
[2018-06-21 14:50:59.595 +0000] INFO  [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null}
[2018-06-21 14:51:01.609 +0000] INFO  [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null}
[2018-06-21 14:51:03.624 +0000] INFO  [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null}
[2018-06-21 14:51:05.675 +0000] INFO  [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null}
[2018-06-21 14:51:07.689 +0000] INFO  [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null}
[2018-06-21 14:51:09.704 +0000] INFO  [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null}
[2018-06-21 14:51:11.718 +0000] INFO  [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null}
[2018-06-21 14:51:13.732 +0000] INFO  [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null}
[2018-06-21 14:51:15.745 +0000] INFO  [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null}

 

Edit: this doesn't appear to be KAFKA specific issue.   I tried another cluster w/o KAFKA and this time it hangs on a CDH parcel:

c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (CDH, 5.15.0-1.cdh5.15.0.p0.21) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=4900, count=0, countTotal=7, warnings=null, errors=null}

 

1 ACCEPTED SOLUTION

avatar
Explorer

We figured this one out.  The nodes were not able to download the parcels from the manager instance.  As it turns out we had DNS Hostnames set to NO for the VPC (uAWS).

 

The message that tipped us off was in the cloudera-scm-agent.log on each of the nodes.

 

[21/Jun/2018 19:42:05 +0000] 13795 Thread-13 downloader   INFO     Fetching torrent: http://ip-10-2-4-152.us-east-2.compute.internal:7180/cmf/parcel/download/KAFKA-3.0.0-1.3.0.0.p0.40-el7.parcel.torrent
[21/Jun/2018 19:42:05 +0000] 13795 Thread-13 https        ERROR    Failed to retrieve/stroe URL: http://ip-10-2-4-152.us-east-2.compute.internal:7180/cmf/parcel/download/KAFKA-3.0.0-1.3.0.0.p0.40-el7.parcel.torrent -> /opt/cloudera/parcel-cache/KAFKA-3.0.0-1.3.0.0.p0.40-el7.parcel.torrent <urlopen error [Errno -2] Name or service not known>
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.15.0-py2.7.egg/cmf/https.py", line 191, in fetch_to_file
    resp = self.open(req_url)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.15.0-py2.7.egg/cmf/https.py", line 186, in open 

 

View solution in original post

2 REPLIES 2

avatar
Explorer

We figured this one out.  The nodes were not able to download the parcels from the manager instance.  As it turns out we had DNS Hostnames set to NO for the VPC (uAWS).

 

The message that tipped us off was in the cloudera-scm-agent.log on each of the nodes.

 

[21/Jun/2018 19:42:05 +0000] 13795 Thread-13 downloader   INFO     Fetching torrent: http://ip-10-2-4-152.us-east-2.compute.internal:7180/cmf/parcel/download/KAFKA-3.0.0-1.3.0.0.p0.40-el7.parcel.torrent
[21/Jun/2018 19:42:05 +0000] 13795 Thread-13 https        ERROR    Failed to retrieve/stroe URL: http://ip-10-2-4-152.us-east-2.compute.internal:7180/cmf/parcel/download/KAFKA-3.0.0-1.3.0.0.p0.40-el7.parcel.torrent -> /opt/cloudera/parcel-cache/KAFKA-3.0.0-1.3.0.0.p0.40-el7.parcel.torrent <urlopen error [Errno -2] Name or service not known>
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.15.0-py2.7.egg/cmf/https.py", line 191, in fetch_to_file
    resp = self.open(req_url)
  File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.15.0-py2.7.egg/cmf/https.py", line 186, in open 

 

avatar
Expert Contributor

Glad you figured this out. Thanks for posting your solution for other forum users.