Created on 03-29-2017 04:27 PM - edited 09-16-2022 04:22 AM
Hello Friends:
A quick preamble, and then a question ...
I run CDH 5.10 on CentOS6 final for personal use (1-node for Master and CM; and 4-nodes for Workers/Slaves). They are all Linux LXC/Containers.
It's been a while since I spun the cluster up, so the first thing I did was a 'yum update' of the nodes. No issues there. The cluster is up and running. All green statuses in CM.
However, one thing that used to work but now does not, is the pyspark command. Whe I run now, I get the following exception:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
The jar file for that class is:
/usr/lib/hadoop/client/hadoop-common.jar
After troubleshooting -- again it's been a while since I used the cluster, so some things may have changed -- I determined that the SPARK_DIST_CLASSPATH environment variable was getting set, but did not contain any of the jars in that directory (including, of course, the one mentioned above).
The script ultimately responsible for setting SPARK_DIST_CLASSPATH is:
/etc/spark/conf/spark-env.sh
and it consults the list of jars in the classpath.txt file to do so.
Sadly, that file does not have any of the jars in the aforementioned directory. I could, of course, manually add them, but I found it odd that it did not have them in the first place. It seems like an important directory of jars to have included in classpath.txt. (again, in /usr/lib/hadoop/client/)
So my questions (finally) ... =:)
I'm curious. It seems odd that they were left out, and I don't want to just blindly add them in.
Thank you in advance!
Created on 03-29-2017 08:03 PM - edited 03-29-2017 08:04 PM
I ended up repairing the issue after more work.
The UI eventually revealed to me that the version of CDH (5.10) and CM (5.4) were not in sync. When I investigated why, I found that the entry in /etc/yum.repos.d/cloudera-manager.repo was pegged at CDH 5.4, so my 'yum updates' did not update CM (though it updated everything else). So that made sense.
I updated the repo file, yum updated CM, and restarted. Then I let the UI walk me through a few upgrades and corrections of stale states. So I unfortunately don't know where the fix came. =:) But basically we can say that classpath.txt hadn't been updated properly. Now it has the correct entries.
I'm glad I didn't brute-force things (not my style anyway). I doubt this one-off issue will help anyone, but who knows. =:)
Created on 03-29-2017 08:03 PM - edited 03-29-2017 08:04 PM
I ended up repairing the issue after more work.
The UI eventually revealed to me that the version of CDH (5.10) and CM (5.4) were not in sync. When I investigated why, I found that the entry in /etc/yum.repos.d/cloudera-manager.repo was pegged at CDH 5.4, so my 'yum updates' did not update CM (though it updated everything else). So that made sense.
I updated the repo file, yum updated CM, and restarted. Then I let the UI walk me through a few upgrades and corrections of stale states. So I unfortunately don't know where the fix came. =:) But basically we can say that classpath.txt hadn't been updated properly. Now it has the correct entries.
I'm glad I didn't brute-force things (not my style anyway). I doubt this one-off issue will help anyone, but who knows. =:)