Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Stuck Removing Parcel

Stuck Removing Parcel

New Contributor

I hope someone can help me with removing a parcel from my 146 node cluster running Cloudera Manager 4.6.3. Cloudera Manager is reporting the parcel is stuck at 96% complete, and any attempt I have made to fix the issue has failed.

 

By inspecting the page using Firebug, I can see that it's polling /parcels/details for progress regarding the undistribution operation. What's interesting is that in the progress field of the response it is reporting 140 / 146 -- this equates roughly to the 96% complete. Curiously, 6 nodes are abesent from the status, with no indication as to which ones.

 

I wrote a script to check each of the nodes for the installed parcel, but strangely all are reporting that the parcel has been successfully removed (parcel is not in /opt/cloudera/parcels). Something else that I have tried is to reactivate the parcel through the RESTful interface. Through the RESTful interface I am able to redistribute, and reactivate it, but any attempt to remove the parcel produces the error again.

 

I have checked / watched the logs /var/log/cloudera-scm-server, and can see the commands passing through, but there is not any helpful information for debugging what exactly is going on.

 

Has anyone experienced anything similar, or have any suggestions?

3 REPLIES 3

Re: Stuck Removing Parcel

Cloudera Employee

So, you have 146 nodes in your cluster? Are they all healthy and have CM Agents heartbeating as they should? (As indicated on the Hosts page in CM). Can you cancel the remove operation?

 

At a high level, I'd recommend upgrading to a newer version of CM - there have been many substantial improvements in 4.7, 4.8 and now 5.0, including increasing the detail around error reporting and handling in these situations. If the option exists, I'd highly recommend upgrading to 5.0 (which you can do independent of the version of CDH running on your cluster - as long as it's not CDH 3)

Re: Stuck Removing Parcel

New Contributor

Yeah, all 146 are in the cluster, and are reporting healthy.

 

The remove operation does not have a cancel option, frustratingly.

 

The plan is to move to a newer version, but as of this time it's not logistically feasible. I am currently in the process of making the CDH3 to 4 upgrade.

Highlighted

Re: Stuck Removing Parcel

Cloudera Employee

As a first step, you should restart all the CM Agents. sometimes they could (in this version) get confused under certain conditions and their internal knowledge of what parcels are present diverges from what's on disk - restarting means they'll initialise themselves based on the actual on-disk state, and should bring you back into sync.

Don't have an account?
Coming from Hortonworks? Activate your account here