Support Questions

NT · ‎08-27-2015

We have Spark installed via Cloudera Manager on a YARN cluster. It appears there is a classpath.txt file in /etc/spark/conf that include list of jars that should be available on spark's distributed classpath. And spark-env.sh seems to be the on that's exporting this configuration.

It is my understanding that cloudera manager creates the classpath.txt file. I would like how does cloudera manger evaluate the list of jars that go into this file, and is it something that can be controlled through cloudera manager.

Thank you!

Wilfred · ‎09-01-2015

yes CM generates this as part of the gateway (client config). The classpath text file is generated by CM based on the dependencies that are defined in the deployment.

This is not something you can change.

As you can see in the upstream docs we use a form of hadoop free distribution but we still only test this with CDH and the specific dependencies.

Does that explain what you are lookign for?

WIlfred

View solution in original post

Wilfred · ‎09-01-2015

For adding custom classes to the classpath you should use one of the two following options:
- add them via the command line options
- add them via the config

For the driver you have the option to use: --driver-class-path /path/to/file

Or for the the executor use

--conf "spark.executor.extraClassPath=/path/to/jar"

In spark-defaults.conf set the two values (or one if you only need it for one side
spark.driver.extraClassPath
spark.executor.extraClassPath

This can be done through the CM UI.

Depending on the exact thing you are doing you might see limitations of which option you can use.

Wilfred

NT · ‎09-01-2015

Thank you for your response Wilfred. It sure helps me. However, my question was more towards understanding how classpath.txt file mentioned below is created? Does CM create this file on all nodes, is it something we can configure through CM?

08:42:43 $ ll /etc/spark/conf/
total 60
drwxr-xr-x 3 root root 4096 Aug 25 12:28 ./
drwxr-xr-x 3 root root 4096 Aug 25 12:28 ../
-rw-r--r-- 1 root root 29228 Aug 25 12:28 classpath.txt
-rw-r--r-- 1 root root 21 Aug 25 12:28 __cloudera_generation__
-rw-r--r-- 1 root root 550 Aug 25 12:28 log4j.properties
-rw-r--r-- 1 root root 800 Aug 25 12:28 spark-defaults.conf
-rw-r--r-- 1 root root 1122 Aug 25 12:28 spark-env.sh
drwxr-xr-x 2 root root 4096 Aug 25 12:28 yarn-conf/

Wilfred · ‎09-01-2015

yes CM generates this as part of the gateway (client config). The classpath text file is generated by CM based on the dependencies that are defined in the deployment.

This is not something you can change.

As you can see in the upstream docs we use a form of hadoop free distribution but we still only test this with CDH and the specific dependencies.

Does that explain what you are lookign for?

WIlfred

NT · ‎09-01-2015

Thank you for the quick response, I really appreciate helping me clear my questions.

The answer was exactly what I was looking for. It is automated and users cannot control the elements of classpath.txt file.

Pardon my naive question, but can it pose a problem having different versions of same dependencies on classpath?

Example:

09:39:34 $ cat /etc/spark/conf/classpath.txt | grep jersey-server
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p886.563/jars/jersey-server-1.9.jar
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p886.563/jars/jersey-server-1.14.jar

Wilfred · ‎09-03-2015

It should not pose a problem. If it does let us know but we have not seen an issue with this.

Wilfred

NT · ‎09-03-2015

Thank you! That definitely helps.

andreF · ‎09-16-2015

Actually I think i got an issue related to the fact that classpath.txt contains multiple versions of the same jar:

The issue is related to this jira : https://issues.apache.org/jira/browse/SPARK-8332

And on /etc/spark/conf/classpath.txt :

-----------------------------

cat /etc/spark/conf/classpath.txt | grep jackson

/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/jackson-annotations-2.2.3.jar
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/jackson-core-2.2.3.jar
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/jackson-databind-2.2.3.jar
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/jackson-annotations-2.3.0.jar
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/jackson-core-2.3.1.jar
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/jackson-databind-2.3.1.jar

-----------------------------

Somehow the classloader is pointing to the version 2.2.3 of jackson, where the method handledType() of the class BigDecimalDeserializer does not exist.

Similar errors may appears for jersey as well since the api changed a bit inbetween those versions.

Is that a way to solve this kind of issue in a proper way?

zlpmichelle · ‎02-25-2016

Hi andreF,

I have the similar issue, did you fix the issue?

SandeepP · ‎09-19-2018

I understand this is older post but I am getting same problem. Can you please provide solution if it is resolved for you?

Thanks

Cloudera Community

Support Questions

Spark distributed classpath