09-02-2014 03:50 PM
I am attempting to reinstall my cluster and no matter what I do the installation is setting the /etc/alternatives binary symlinks to an old/invalid parcels version.
During the reinstallation process the parcels are distributed.
[parcels]$ ls -lsa /opt/cloudera/parcels
4 drwxr-xr-x 3 root root 4096 Sep 2 22:40 .
4 drwxr-xr-x 4 root root 4096 Sep 2 22:33 ..
0 lrwxrwxrwx 1 root root 25 Sep 2 22:40 CDH -> CDH-5.1.2-1.cdh5.1.2.p0.3
4 drwxrwxr-x 10 root root 4096 Aug 26 04:03 CDH-5.1.2-1.cdh5.1.2.p0.3
But the cloudera-scm-agent is creating symlinks in /etc/alternatives to a different parcels version (CDH-5.1.0-1.cdh5.1.0.p0.53).
[parcels]$ ls -lsa /etc/alternatives | grep cloudera
4 lrwxrwxrwx 1 root root 63 Sep 2 22:40 avro-tools -> /opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/bin/avro-tools
4 lrwxrwxrwx 1 root root 60 Sep 2 22:40 beeline -> /opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/bin/beeline
4 lrwxrwxrwx 1 root root 61 Sep 2 22:40 catalogd -> /opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/bin/catalogd
0 lrwxrwxrwx 1 root root 59 Sep 2 22:40 cli_mt -> /opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/bin/cli_mt
0 lrwxrwxrwx 1 root root 59 Sep 2 22:40 cli_st -> /opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/bin/cli_st
4 lrwxrwxrwx 1 root root 61 Sep 2 22:40 flume-ng -> /opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/bin/flume-ng
Where in the HECK is this information stored either on the CM server or on the agents?
It's pretty important for a user to be able to remove software completely and reinstall without having to completely reinstall the operating system, which is what point I'm at right now.
09-02-2014 04:40 PM
09-03-2014 12:27 PM
I have found the solution to the problem.
Cloudera-scm-agent runs a tool called /usr/lib64/cmf/service/common/alternatives.sh to generate /etc/alternatives and symlinks to /usr/bin/*. This bash script executes update-alternatives based on the PARCELS_DIR and PARCEL_DIRNAME variables. There are files in /var/lib/alternatives/ which seem to be used as overrides for the update-alternatives tool. Regardless of what you give update-alternatives, if there is a file in /var/lib/alternatives for that same alternative name it will use the information from the /var/lib/alternatives file.
For some reason the /var/lib/alternatives files for cloudera have two entries in them, one for the old parcel and one for the new parcel. This may have happened by reinstalling without deactivating the old cluster/parcel first.
# cat /var/lib/alternatives/sqoop
When I remove the /var/lib/alternatives/sqoop-import file and restart cloudera-scm-agent the proper symlink is created.
# ls -lsa /etc/alternatives/sqoop-import
4 lrwxrwxrwx 1 root root 64 Sep 3 19:00 /etc/alternatives/sqoop-import -> /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/bin/sqoop-import
So, in closing, it may be neccessary to remove all cloudera related /var/lib/alternatives/ files after a botched install if you do not deactivate the parcel prior to reinstall.
# grep -l cloudera /var/lib/alternatives/*
03-16-2015 04:01 AM
I've wrote a little script to resolve the issue.
although the script is a workaround and I still need to find the proper solution.
03-20-2015 12:21 AM
We are facing a similar problem of missing Symlinks. We have tried all the solutions mentioned in this thread but to no avail.
We are on CDH5.3.1, Cloudera Manager 5.3.2
"ls -lsa /etc/alternatives | grep hadoop" shows a only single entry hadoop-conf and it points to /etc/hadoop/conf.cloudera.yarn
There are no other entries for hive or hbase or sqoop or even hadoop!
There are no entries in /var/lib/alternatives.
As suggested, we stopped the cluster, did a "sudo service cloudera-scm-agent restart" but the symlinks have not been created.
Any suggestions what is wrong and what needs to be done?
Thanks in advance.
03-24-2015 07:44 AM
With nothing working and no help from anywhere, we finally formatted the Namenode and re-installed Cloudera Manager and CDH 5.3.
The symlinks have been created and the command "hadoop fs -ls" seems to work from the command prompt from any folder now.
I don't think this is the right way to address the problem so I'd still look for inputs from anyone.
03-24-2015 12:54 PM - edited 03-24-2015 12:55 PM
You are correct, the steps you took shouldn't have been neccessary but it sounds like my original issue was slightly different than yours.
Were you able to see anything in the cloudera-scm-agent logs after the restart that might have pointed to an issue creating the symlinks?
You should have seen things like:
[03/Sep/2014 00:06:23 +0000] 21109 Thread-13 parcel_cache INFO Checking checksum of parcel CDH-5.1.2-1.cdh5.1.2.p0.3-el6.parcel...
[03/Sep/2014 00:06:28 +0000] 21109 Thread-13 parcel_cache INFO Unpacking /opt/cloudera/parcel-cache/CDH-5.1.2-1.cdh5.1.2.p0.3-el6.parcel into /opt/cloudera/parcels
[03/Sep/2014 00:06:56 +0000] 21109 MainThread parcel INFO Loading parcel manifest for: CDH-5.1.2-1.cdh5.1.2.p0.3
[03/Sep/2014 00:06:56 +0000] 21109 MainThread parcel INFO Ensuring users/groups exist for new parcel CDH-5.1.2-1.cdh5.1.2.p0.3.
[03/Sep/2014 00:06:57 +0000] 21109 MainThread parcel INFO Ensuring correct file permissions for new parcel CDH-5.1.2-1.cdh5.1.2.p0.3.
[03/Sep/2014 00:07:15 +0000] 21109 MainThread parcel INFO Activating system symlinks for parcel CDH-5.1.2-1.cdh5.1.2.p0.3
[03/Sep/2014 00:07:15 +0000] 21109 MainThread parcel INFO Ensuring alternatives entries are activated for parcel CDH-5.1.2-1.cdh5.1.2.p0.3.
What was the status of your /opt/cloudera/parcels directory? Had you just recently changed to a new version? Did you deactivate the old parcel before activating the new parcel?
Sorry I didn't respond earlier to prevent you from having to reinstall.
03-24-2015 11:43 PM
Not an issue Mshirley, many thanks for your comments.
To be honest, I didn't check the cloudera-scm-agent logs after the restart so cannot say whether there were any errors reported. I'll keep that in mind next time onwards.
Also, /opt/cloudera/parcels directory contained 2 files - one was named CDH (a link to the CDH5.3 directory, I suppose) and the actual CDH5.3xxxx directory (sorry, cannot recollect the exact name) which contained the /lib /bin and several other parcel directories . And yes, we did try deactivating the old parcel before downloading the new one. Unfortunately, nothing worked.
Around the same time, we also did a reboot and noticed that one of the nodes did not come up. As a result, we were forced to do the re-installation again.