08-27-2015 11:31 AM
We have Spark installed via Cloudera Manager on a YARN cluster. It appears there is a classpath.txt file in /etc/spark/conf that include list of jars that should be available on spark's distributed classpath. And spark-env.sh seems to be the on that's exporting this configuration.
It is my understanding that cloudera manager creates the classpath.txt file. I would like how does cloudera manger evaluate the list of jars that go into this file, and is it something that can be controlled through cloudera manager.
09-01-2015 05:55 AM
For adding custom classes to the classpath you should use one of the two following options:
- add them via the command line options
- add them via the config
For the driver you have the option to use: --driver-class-path /path/to/file
Or for the the executor use
In spark-defaults.conf set the two values (or one if you only need it for one side
This can be done through the CM UI.
Depending on the exact thing you are doing you might see limitations of which option you can use.
09-01-2015 06:44 AM
Thank you for your response Wilfred. It sure helps me. However, my question was more towards understanding how classpath.txt file mentioned below is created? Does CM create this file on all nodes, is it something we can configure through CM?
08:42:43 $ ll /etc/spark/conf/
drwxr-xr-x 3 root root 4096 Aug 25 12:28 ./
drwxr-xr-x 3 root root 4096 Aug 25 12:28 ../
-rw-r--r-- 1 root root 29228 Aug 25 12:28 classpath.txt
-rw-r--r-- 1 root root 21 Aug 25 12:28 __cloudera_generation__
-rw-r--r-- 1 root root 550 Aug 25 12:28 log4j.properties
-rw-r--r-- 1 root root 800 Aug 25 12:28 spark-defaults.conf
-rw-r--r-- 1 root root 1122 Aug 25 12:28 spark-env.sh
drwxr-xr-x 2 root root 4096 Aug 25 12:28 yarn-conf/
09-01-2015 07:35 AM
yes CM generates this as part of the gateway (client config). The classpath text file is generated by CM based on the dependencies that are defined in the deployment.
This is not something you can change.
As you can see in the upstream docs we use a form of hadoop free distribution but we still only test this with CDH and the specific dependencies.
Does that explain what you are lookign for?
09-01-2015 07:43 AM
Thank you for the quick response, I really appreciate helping me clear my questions.
The answer was exactly what I was looking for. It is automated and users cannot control the elements of classpath.txt file.
Pardon my naive question, but can it pose a problem having different versions of same dependencies on classpath?
09:39:34 $ cat /etc/spark/conf/classpath.txt | grep jersey-server
09-16-2015 06:21 AM
Actually I think i got an issue related to the fact that classpath.txt contains multiple versions of the same jar:
The issue is related to this jira : https://issues.apache.org/jira/browse/SPARK-8332
And on /etc/spark/conf/classpath.txt :
cat /etc/spark/conf/classpath.txt | grep jackson
Somehow the classloader is pointing to the version 2.2.3 of jackson, where the method handledType() of the class BigDecimalDeserializer does not exist.
Similar errors may appears for jersey as well since the api changed a bit inbetween those versions.
Is that a way to solve this kind of issue in a proper way?
02-25-2016 09:30 AM
I have the similar issue as andreF's, we have serval differnt guava in /etc/spark/conf/classpath.txt, do you know how to fix the issue?
Our app needs to use guava-16.0.1.jar, so I add guava-16.0.1.jar into /opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/jars/, and add "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/jars/guava-16.0.1.jar" into /etc/spark/conf/classpath.txt.
However, it doesn't work, spark action in oozie still can not find guava-16.0.1.jar. How does classpath.txt work? Do you know how to manage or modify the classpath.txt manually? Thanks!