Created on 07-06-2018 06:52 AM - edited 09-16-2022 06:25 AM
Hi,
I tried to upgrade CDH fro 5.9.0 to 5.9.3.
I used Cloudera Manager:
I was not able to finished and I returned to 5.9.0.
This screenshot shows the error messages:
I'm tring to find the reason of upgrade failure.
What should I checked?
When I'm checking different elements I have some questions.
Thanks for any answers or advice.
1.Is it OK that .flood directory owner is cloudera-scm? Other directories in parcel directories are owned by root.
[root@etl1 parcels]# ls -al total 0 drwxr-xr-x 5 root root 125 Jul 6 12:11 . drwxr-xr-x 4 cloudera-scm cloudera-scm 39 Nov 9 2016 .. lrwxrwxrwx 1 root root 26 Jul 3 13:23 CDH -> CDH-5.9.0-1.cdh5.9.0.p0.23 drwxr-xr-x 11 root root 110 Oct 21 2016 CDH-5.9.0-1.cdh5.9.0.p0.23 drwxr-xr-x 2 cloudera-scm cloudera-scm 6 Jul 5 09:58 .flood lrwxrwxrwx 1 root root 43 Jul 3 09:58 SPARK2 -> SPARK2-2.1.0.cloudera2-1.cdh5.7.0.p0.171658 drwxr-xr-x 6 root root 47 Sep 25 2017 SPARK2-2.1.0.cloudera2-1.cdh5.7.0.p0.171658
2. In Cloudera Manager/ Parcels I have all time such errors? Should I be worring about them?
Example: Error for parcel SPARK-0.9.0-1.cdh4.6.0.p0.98-el7 : Parcel not available for OS Distribution RHEL7.
I have CentOS 7 not RedHat?
3. I did some removal of 5.9.3 parcel from Cloudera Manager.
parcel CDH 5.9.3-1.cdh5.9.3.p0.4 changed status in CM from Distributed to Downloaded
I expect that in /opt/cloudera directories on all servers i shouldn't have 5.9.3 files. But I have such files in:
* all parcel-cache directories,
* in some parcels directories.
Should I remove them?
4. I noticed that some 5.9.3 files in parcels directories that were not removed are older. So previous admin has to do something with this upgrade. May it cause a problem?
[root@etl2 cloudera]# ls -al parcels total 4 drwxr-xr-x 6 root root 4096 Jul 3 13:23 . drwxr-xr-x 4 cloudera-scm cloudera-scm 39 Nov 9 2016 .. lrwxrwxrwx 1 root root 26 Jul 3 13:23 CDH -> CDH-5.9.0-1.cdh5.9.0.p0.23 drwxr-xr-x 11 root root 110 Oct 21 2016 CDH-5.9.0-1.cdh5.9.0.p0.23 drwxr-xr-x 11 root root 110 Jun 28 2017 CDH-5.9.3-1.cdh5.9.3.p0.4 drwxr-xr-x 2 cloudera-scm cloudera-scm 6 Jul 4 12:45 .flood lrwxrwxrwx 1 root root 43 Jun 16 05:18 SPARK2 -> SPARK2-2.1.0.cloudera2-1.cdh5.7.0.p0.171658 drwxr-xr-x 6 root root 47 Sep 25 2017 SPARK2-2.1.0.cloudera2-1.cdh5.7.0.p0.171658
5. Should I remove 5.9.3 files from parcel-repo directory on Cloudera Manager Server?
I think that Delete comand for that parcel from Cloudera Manager should remove it.
[root@cms1 parcel-repo]# ls -al total 3091828 drwxr-xr-x. 2 cloudera-scm cloudera-scm 4096 Jul 3 11:20 . drwxr-xr-x. 4 cloudera-scm cloudera-scm 34 Nov 4 2016 .. -rw-r----- 1 cloudera-scm cloudera-scm 1492922238 Nov 10 2016 CDH-5.9.0-1.cdh5.9.0.p0.23-el7.parcel -rw-r----- 1 cloudera-scm cloudera-scm 41 Nov 10 2016 CDH-5.9.0-1.cdh5.9.0.p0.23-el7.parcel.sha -rw-r----- 1 cloudera-scm cloudera-scm 57125 Nov 10 2016 CDH-5.9.0-1.cdh5.9.0.p0.23-el7.parcel.torrent -rw-r----- 1 cloudera-scm cloudera-scm 1500799059 Jul 3 11:19 CDH-5.9.3-1.cdh5.9.3.p0.4-el7.parcel -rw-r----- 1 cloudera-scm cloudera-scm 41 Jul 3 11:19 CDH-5.9.3-1.cdh5.9.3.p0.4-el7.parcel.sha -rw-r----- 1 cloudera-scm cloudera-scm 57424 Jul 3 11:20 CDH-5.9.3-1.cdh5.9.3.p0.4-el7.parcel.torrent -rw-r----- 1 cloudera-scm cloudera-scm 172161150 Jan 29 14:35 SPARK2-2.1.0.cloudera2-1.cdh5.7.0.p0.171658-el7.parcel -rw-r----- 1 cloudera-scm cloudera-scm 41 Jan 29 14:35 SPARK2-2.1.0.cloudera2-1.cdh5.7.0.p0.171658-el7.parcel.sha -rw-r----- 1 cloudera-scm cloudera-scm 6760 Jan 29 15:17 SPARK2-2.1.0.cloudera2-1.cdh5.7.0.p0.171658-el7.parcel.torrent
6. What exactly does parcel activation mean?
This activation changes CDH directory definition and links it to CDH-5.9.3... directory instead of CDH5.9.1>
lrwxrwxrwx 1 root root 26 Jul 3 13:23 CDH -> CDH-5.9.0-1.cdh5.9.0.p0.23 drwxr-xr-x 11 root root 110 Oct 21 2016 CDH-5.9.0-1.cdh5.9.0.p0.23 drwxr-xr-x 11 root root 110 Jun 28 2017 CDH-5.9.3-1.cdh5.9.3.p0.4
Thanks in advance
Andrzej
Created 07-06-2018 11:56 AM
Thank you for the information.
(1)
yes, .flood permissions are fine
(2)
You have many parcel urls in your Parcels configuraiton that do not support el7. You can remove the parcel urls that you do not need:
Impala
CDH4
Spark
Solr
To remove the parcel URLs, go to Administration --> Settings --> Parcels.
Once these are removed, Cloudera Manager will no longer try to find parcels for your el7 hosts.
(3)
Yes, if Cloudera Manager shows that CDH 5.9.3 parcel is in Downloaded state, then you can remove any 5.9.3 files or directories in /opt/cloudera/parcels and /opt/cloudera/parcel-cache and /opt/cloudera/parcels/.flood
Be careful you are removing only 5.9.3 though
(4)
Yes, indeed something very odd happened here, so I agree that cleaning up and trying again is the right path to success.
(5)
There is no benefit in deleting the parcel (from parcel-repo) but also no harm in it
(6)
You are correct: Activiation adds the CDH link to the activated parcel and the "active" parcel is registered in Cloudera Manager so that it will ensure all agents have activated the parcel, too.
Lastly, if you clean up and the same problem happens again, let's take a closer look at the agent logs on the hosts where zookeeper won't start. There may be clues.
good luck!
Created 07-06-2018 11:56 AM
Thank you for the information.
(1)
yes, .flood permissions are fine
(2)
You have many parcel urls in your Parcels configuraiton that do not support el7. You can remove the parcel urls that you do not need:
Impala
CDH4
Spark
Solr
To remove the parcel URLs, go to Administration --> Settings --> Parcels.
Once these are removed, Cloudera Manager will no longer try to find parcels for your el7 hosts.
(3)
Yes, if Cloudera Manager shows that CDH 5.9.3 parcel is in Downloaded state, then you can remove any 5.9.3 files or directories in /opt/cloudera/parcels and /opt/cloudera/parcel-cache and /opt/cloudera/parcels/.flood
Be careful you are removing only 5.9.3 though
(4)
Yes, indeed something very odd happened here, so I agree that cleaning up and trying again is the right path to success.
(5)
There is no benefit in deleting the parcel (from parcel-repo) but also no harm in it
(6)
You are correct: Activiation adds the CDH link to the activated parcel and the "active" parcel is registered in Cloudera Manager so that it will ensure all agents have activated the parcel, too.
Lastly, if you clean up and the same problem happens again, let's take a closer look at the agent logs on the hosts where zookeeper won't start. There may be clues.
good luck!
Created 07-10-2018 09:11 AM
Thank you for you answers.
They are very helpfull for me.
I'm cleaing our cluster according your advice.
I have a question concerning (2):
Removing for example Impala parcel will not impact Impala service that we are using?
Upgrade of Impala is done by upgrade of CDH so parcel for Impala is not required, am I correct?
I also have additional question.
On Friday I have CDH 5.9.3 parcel in status Downloaded. And I have possibility to do a few action with it. I do not remember exactly by at least: distribute and remove.
But now, I do nothing with this parcel, I have status: Undistributin 63% and no action.
Why this state change - I think I do nothing with the cluster?
What should I do in such case?
Thanks in advanced
Andrzej
Created 07-10-2018 10:26 AM
Impala is included in the CDH parcel, so there is no use for the Impala parcel.
As for the parcel being stuck in "undistributing" that usually indicates that one or more agents may not be able to perform an action that is necessary for this to complete. The first thing to do is isolate what host or hosts are "stuck".
To do so, you may be able to see more information in the /var/log/cloudera-scm-server/cloudera-scm-server.log, but I am not certain what to look for.
Also, I wonder if possibly some process is actually using the 5.9.3 parcel. To check, go to the Parcel Usage page and check to see if any clusters listed there show CDH 5.9.3 in use... if so, you may need to shut down the service before undistributing the parcel.
Another thing to try is to look at all your agent logs and see if there may be an error regarding a parcel or directory.
After you have cleaned up, make sure to restart all your agents, too, to make sure they have a fresh view of their parcels.
Created 07-20-2018 01:46 AM
Thank you for your help. We finished the upgrade with success.
After some cleanings in the cluster we started the upgrade using CM Upgrade page. We did all steps until the new parcel activation. When we tried to do next steps from CM Upgrade page we got a error box without any messages.
We redeployed configuration and restarted services manually using CM.
At the end we had to redeploy Oozie Shared libraries and SQL Server JDBC dirver.