Created on 06-21-2018 07:54 AM - edited 09-16-2022 06:22 AM
I am trying to deploy a cluster in AWS using Cloudera Director. It appears to go smooth right up until the end with 579 / 597 steps completed.
GUI indicates:
"Distributing parcels: KAFKA-3.0.0-1.3.0.0.p0.40,CDH-5.15.0-1.cdh5.15.0.p0.21"
cloudera-director-server indicates over and over :
51.538 +0000] INFO [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null} [2018-06-21 14:50:53.552 +0000] INFO [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null} [2018-06-21 14:50:55.565 +0000] INFO [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null} [2018-06-21 14:50:57.579 +0000] INFO [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null} [2018-06-21 14:50:59.595 +0000] INFO [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null} [2018-06-21 14:51:01.609 +0000] INFO [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null} [2018-06-21 14:51:03.624 +0000] INFO [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null} [2018-06-21 14:51:05.675 +0000] INFO [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null} [2018-06-21 14:51:07.689 +0000] INFO [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null} [2018-06-21 14:51:09.704 +0000] INFO [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null} [2018-06-21 14:51:11.718 +0000] INFO [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null} [2018-06-21 14:51:13.732 +0000] INFO [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null} [2018-06-21 14:51:15.745 +0000] INFO [p-c86a21fc7a0e-DefaultBootstrapClusterJob] 1b43a3bf-7aa4-4d09-bf79-ed5f30552520 POST /api/v12/import com.cloudera.launchpad.bootstrap.cluster.UnboundedWaitForParcelStage - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (KAFKA, 3.0.0-1.3.0.0.p0.40) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=1600, count=0, countTotal=4, warnings=null, errors=null}
Edit: this doesn't appear to be KAFKA specific issue. I tried another cluster w/o KAFKA and this time it hangs on a CDH parcel:
c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (CDH, 5.15.0-1.cdh5.15.0.p0.21) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=0, progressTotal=4900, count=0, countTotal=7, warnings=null, errors=null}
Created on 06-21-2018 01:10 PM - edited 06-21-2018 01:10 PM
We figured this one out. The nodes were not able to download the parcels from the manager instance. As it turns out we had DNS Hostnames set to NO for the VPC (uAWS).
The message that tipped us off was in the cloudera-scm-agent.log on each of the nodes.
[21/Jun/2018 19:42:05 +0000] 13795 Thread-13 downloader INFO Fetching torrent: http://ip-10-2-4-152.us-east-2.compute.internal:7180/cmf/parcel/download/KAFKA-3.0.0-1.3.0.0.p0.40-el7.parcel.torrent [21/Jun/2018 19:42:05 +0000] 13795 Thread-13 https ERROR Failed to retrieve/stroe URL: http://ip-10-2-4-152.us-east-2.compute.internal:7180/cmf/parcel/download/KAFKA-3.0.0-1.3.0.0.p0.40-el7.parcel.torrent -> /opt/cloudera/parcel-cache/KAFKA-3.0.0-1.3.0.0.p0.40-el7.parcel.torrent <urlopen error [Errno -2] Name or service not known> Traceback (most recent call last): File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.15.0-py2.7.egg/cmf/https.py", line 191, in fetch_to_file resp = self.open(req_url) File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.15.0-py2.7.egg/cmf/https.py", line 186, in open
Created on 06-21-2018 01:10 PM - edited 06-21-2018 01:10 PM
We figured this one out. The nodes were not able to download the parcels from the manager instance. As it turns out we had DNS Hostnames set to NO for the VPC (uAWS).
The message that tipped us off was in the cloudera-scm-agent.log on each of the nodes.
[21/Jun/2018 19:42:05 +0000] 13795 Thread-13 downloader INFO Fetching torrent: http://ip-10-2-4-152.us-east-2.compute.internal:7180/cmf/parcel/download/KAFKA-3.0.0-1.3.0.0.p0.40-el7.parcel.torrent [21/Jun/2018 19:42:05 +0000] 13795 Thread-13 https ERROR Failed to retrieve/stroe URL: http://ip-10-2-4-152.us-east-2.compute.internal:7180/cmf/parcel/download/KAFKA-3.0.0-1.3.0.0.p0.40-el7.parcel.torrent -> /opt/cloudera/parcel-cache/KAFKA-3.0.0-1.3.0.0.p0.40-el7.parcel.torrent <urlopen error [Errno -2] Name or service not known> Traceback (most recent call last): File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.15.0-py2.7.egg/cmf/https.py", line 191, in fetch_to_file resp = self.open(req_url) File "/usr/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.15.0-py2.7.egg/cmf/https.py", line 186, in open
Created 06-21-2018 01:15 PM
Glad you figured this out. Thanks for posting your solution for other forum users.