Created on 08-27-2015 11:31 AM - edited 09-16-2022 02:39 AM
We have Spark installed via Cloudera Manager on a YARN cluster. It appears there is a classpath.txt file in /etc/spark/conf that include list of jars that should be available on spark's distributed classpath. And spark-env.sh seems to be the on that's exporting this configuration.
It is my understanding that cloudera manager creates the classpath.txt file. I would like how does cloudera manger evaluate the list of jars that go into this file, and is it something that can be controlled through cloudera manager.
Thank you!
Created 09-01-2015 07:35 AM
yes CM generates this as part of the gateway (client config). The classpath text file is generated by CM based on the dependencies that are defined in the deployment.
This is not something you can change.
As you can see in the upstream docs we use a form of hadoop free distribution but we still only test this with CDH and the specific dependencies.
Does that explain what you are lookign for?
WIlfred
Created 09-01-2015 05:55 AM
For adding custom classes to the classpath you should use one of the two following options:
- add them via the command line options
- add them via the config
For the driver you have the option to use: --driver-class-path /path/to/file
Or for the the executor use
--conf "spark.executor.extraClassPath=/path/to/jar"
In spark-defaults.conf set the two values (or one if you only need it for one side
spark.driver.extraClassPath
spark.executor.extraClassPath
This can be done through the CM UI.
Depending on the exact thing you are doing you might see limitations of which option you can use.
Wilfred
Created 09-01-2015 06:44 AM
Thank you for your response Wilfred. It sure helps me. However, my question was more towards understanding how classpath.txt file mentioned below is created? Does CM create this file on all nodes, is it something we can configure through CM?
08:42:43 $ ll /etc/spark/conf/
total 60
drwxr-xr-x 3 root root 4096 Aug 25 12:28 ./
drwxr-xr-x 3 root root 4096 Aug 25 12:28 ../
-rw-r--r-- 1 root root 29228 Aug 25 12:28 classpath.txt
-rw-r--r-- 1 root root 21 Aug 25 12:28 __cloudera_generation__
-rw-r--r-- 1 root root 550 Aug 25 12:28 log4j.properties
-rw-r--r-- 1 root root 800 Aug 25 12:28 spark-defaults.conf
-rw-r--r-- 1 root root 1122 Aug 25 12:28 spark-env.sh
drwxr-xr-x 2 root root 4096 Aug 25 12:28 yarn-conf/
Created 09-01-2015 07:35 AM
yes CM generates this as part of the gateway (client config). The classpath text file is generated by CM based on the dependencies that are defined in the deployment.
This is not something you can change.
As you can see in the upstream docs we use a form of hadoop free distribution but we still only test this with CDH and the specific dependencies.
Does that explain what you are lookign for?
WIlfred
Created 09-01-2015 07:43 AM
Thank you for the quick response, I really appreciate helping me clear my questions.
The answer was exactly what I was looking for. It is automated and users cannot control the elements of classpath.txt file.
Pardon my naive question, but can it pose a problem having different versions of same dependencies on classpath?
Example:
09:39:34 $ cat /etc/spark/conf/classpath.txt | grep jersey-server
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p886.563/jars/jersey-server-1.9.jar
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p886.563/jars/jersey-server-1.14.jar
Created 09-03-2015 11:48 AM
It should not pose a problem. If it does let us know but we have not seen an issue with this.
Wilfred
Created 09-03-2015 11:51 AM
Created 09-16-2015 06:21 AM
Actually I think i got an issue related to the fact that classpath.txt contains multiple versions of the same jar:
The issue is related to this jira : https://issues.apache.org/jira/browse/SPARK-8332
And on /etc/spark/conf/classpath.txt :
-----------------------------
cat /etc/spark/conf/classpath.txt | grep jackson
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/jackson-annotations-2.2.3.jar
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/jackson-core-2.2.3.jar
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/jackson-databind-2.2.3.jar
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/jackson-annotations-2.3.0.jar
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/jackson-core-2.3.1.jar
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/jackson-databind-2.3.1.jar
-----------------------------
Somehow the classloader is pointing to the version 2.2.3 of jackson, where the method handledType() of the class BigDecimalDeserializer does not exist.
Similar errors may appears for jersey as well since the api changed a bit inbetween those versions.
Is that a way to solve this kind of issue in a proper way?
Created 02-25-2016 09:22 AM
Hi andreF,
I have the similar issue, did you fix the issue?
Created 09-19-2018 09:16 PM