Support Questions
Find answers, ask questions, and share your expertise

Cloudera cluster stuck on 607/615

Cloudera cluster stuck on 607/615

Explorer

I've used director to set up a couple of clusters using the RHEL6.5 64 bit AMI on AWS

 

The master and slave are m32xlarges with 512gb of storage each.

 

the cluster install progresses to step 607/615 before it hangs

 

outputting application.log yields the following notification:

 

[2015-06-04 17:09:49] INFO  [pipeline-thread-32] - c.c.l.b.c.UnboundedWaitForParcelStage: Waiting for parcel (CDH, 5.3.3-1.cdh5.3.3.p0.5) stage DISTRIBUTED. Current: DISTRIBUTING Past: [DISTRIBUTING, DISTRIBUTING, DISTRIBUTING, DISTRIBUTING, DISTRIBUTING, DISTRIBUTING]. State ApiParcelState{progress=705, progressTotal=3600, count=0, countTotal=6, warnings=null, errors=[Error when distributing to ip-192-168-16-236.ec2.internal : Untar failed with return code 2.]}

 

for all nodes in the cluster

 

Error code 2is supposedly out of disk space, but I can't find the file it's attempting to untar to check the disk space.

 

What's happening here?

5 REPLIES 5

Re: Cloudera cluster stuck on 607/615

Explorer

THe agent is logging the following failure:

 

[04/Jun/2015 13:34:02 +0000] 2567 Thread-13 downloader   INFO     Starting download of: http://ip-192-168-16-60.ec2.internal:7180/cmf/parcel/download/CDH-5.3.3-1.cdh5.3.3.p0.5-el6.parcel
[04/Jun/2015 13:34:02 +0000] 2567 Thread-13 downloader   ERROR    HTTP error during download
Traceback (most recent call last):
  File "/usr/lib64/cmf/agent/src/cmf/downloader.py", line 237, in _download
    dlname, op.headers = self.opener.retrieve(url, tempname, cb)
  File "/usr/lib64/python2.6/urllib.py", line 239, in retrieve
    fp = self.open(url, data)
  File "/usr/lib64/python2.6/urllib.py", line 207, in open
    return getattr(self, name)(url)
  File "/usr/lib64/python2.6/urllib.py", line 362, in open_http
    return self.http_error(url, fp, errcode, errmsg, headers)
  File "/usr/lib64/python2.6/urllib.py", line 379, in http_error
    return self.http_error_default(url, fp, errcode, errmsg, headers)
  File "/usr/lib64/cmf/agent/src/cmf/downloader.py", line 58, in http_error_default
    raise urllib2.HTTPError(url, code, msg, headers, fp)
HTTPError: HTTP Error 503: Service Unavailable

Re: Cloudera cluster stuck on 607/615

Expert Contributor

Just saw your second reply -- this makes me less certain of the cause. This looks as though the Cloudera Manager deployment is not being set up properly. Let me think on this for a moment on how to diagnose this issue.

Re: Cloudera cluster stuck on 607/615

Explorer

It also appears to be a thing that /dev/xda1  is out of space and the download is failing.  I was using one of the RHEL 6.5 64 bit AMIs, but the environment is now in a broken state so I can't figure out how to get the specific ID,  I'm going ot creat a new environment and will let you know what the specific AMI was

Re: Cloudera cluster stuck on 607/615

Expert Contributor

How did this new attempt go? Are you still having issues with the size of the root disk partition? 

Re: Cloudera cluster stuck on 607/615

Expert Contributor

Sorry you're running into this issue!  This is occurring because the Cloudera Manager Deployment is distributing a parcel file (the file that contains all of the CDH software) and running out of space in the process.  The file that's being untarred is located at /opt/cloudera/parcels, though there may be copies in /opt/cloudera/parcels-cache or /opt/cloudera-parcels-repo depending on the role of the instance.  Regardless, whatever mount is holding /opt/cloudera is running out of space.

 

This is strange to me -- what exact AMI are you using (AMI ID would be helpful), and on what region?  What settings are you giving to the instance template?

 

Thanks!