Reply
Explorer
Posts: 10
Registered: ‎01-15-2015
Accepted Solution

${SPARK_DIST_CLASSPATH} does not include jars in /usr/lib/hadoop/client/ ...

[ Edited ]

Hello Friends:

 

A quick preamble, and then a question ...

 

I run CDH 5.10 on CentOS6 final for personal use (1-node for Master and CM; and 4-nodes for Workers/Slaves). They are all Linux LXC/Containers.

 

It's been a while since I spun the cluster up, so the first thing I did was a 'yum update' of the nodes. No issues there. The cluster is up and running. All green statuses in CM.

 

However, one thing that used to work but now does not, is the pyspark command. Whe I run now, I get the following exception:

 

 

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream

 

 

The jar file for that class is:

/usr/lib/hadoop/client/hadoop-common.jar

 

 

After troubleshooting -- again it's been a while since I used the cluster, so some things may have changed -- I determined that the SPARK_DIST_CLASSPATH environment variable was getting set, but did not contain any of the jars in that directory (including, of course, the one mentioned above).

 

The script ultimately responsible for setting SPARK_DIST_CLASSPATH is:

 

 

/etc/spark/conf/spark-env.sh

 

and it consults the list of jars in the classpath.txt file to do so.

 

Sadly, that file does not have any of the jars in the aforementioned directory. I could, of course, manually add them, but I found it odd that it did not have them in the first place. It seems like an important directory of jars to have included in classpath.txt(again, in /usr/lib/hadoop/client/)

 

So my questions (finally) ... =:)

  • Any idea why the jars in that directory weren't included in classpath.txt? Was it perhaps just an upgrade issue?

  • Has anyone had to manually add the jars in that directory? (again, in /usr/lib/hadoop/client/)

  • Is /etc/spark/conf/classpath.txt meant to be edited?

 

I'm curious. It seems odd that they were left out, and I don't want to just blindly add them in.

 

Thank you in advance!

 

 

Highlighted
Explorer
Posts: 10
Registered: ‎01-15-2015

SOLVED: Re: ${SPARK_DIST_CLASSPATH} does not include jars in /usr/lib/hadoop/client/ ...

[ Edited ]

I ended up repairing the issue after more work.

 

The UI eventually revealed to me that the version of CDH (5.10) and CM (5.4) were not in sync. When I investigated why, I found that the entry in /etc/yum.repos.d/cloudera-manager.repo was pegged at  CDH 5.4, so my 'yum updates' did not update CM (though it updated everything else). So that made sense.

 

I updated the repo file, yum updated CM, and restarted. Then I let the UI walk me through a few upgrades and corrections of stale states. So I unfortunately don't know where the fix came. =:) But basically we can say that classpath.txt hadn't been updated properly. Now it has the correct entries.

 

I'm glad I didn't brute-force things (not my style anyway). I doubt this one-off issue will help anyone, but who knows. =:)

Announcements