Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark distributed classpath

avatar
Explorer

We have Spark installed via Cloudera Manager on a YARN cluster. It appears there is a classpath.txt file in /etc/spark/conf that include list of jars that should be available on spark's distributed classpath. And spark-env.sh seems to be the on that's exporting this configuration. 

 

It is my understanding that cloudera manager creates the classpath.txt file. I would like how does cloudera manger evaluate the list of jars that go into this file, and is it something that can be controlled through cloudera manager.

 

Thank you!

1 ACCEPTED SOLUTION

avatar
Super Collaborator

yes CM generates this as part of the gateway (client config). The classpath text file is generated by CM based on the dependencies that are defined in the deployment.

This is not something you can change.

 

As you can see in the upstream docs we use a form of hadoop free distribution but we still only test this with CDH and the specific dependencies.

 

Does that explain what you are lookign for?

 

WIlfred

View solution in original post

18 REPLIES 18

avatar
Super Collaborator

For adding custom classes to the classpath you should use one of the two following options:
- add them via the command line options
- add them via the config

 

For the driver you have the option to use: --driver-class-path /path/to/file

 

Or for the the executor use

--conf "spark.executor.extraClassPath=/path/to/jar"


In spark-defaults.conf set the two values (or one if you only need it for one side
  spark.driver.extraClassPath
  spark.executor.extraClassPath

This can be done through the CM UI.

 

Depending on the exact thing you are doing you might see limitations of which option you can use.

 

Wilfred

avatar
Explorer

Thank you for your response Wilfred. It sure helps me. However, my question was more towards understanding how classpath.txt file mentioned below is created? Does CM create this file on all nodes, is it something we can configure through CM?

 

08:42:43 $ ll /etc/spark/conf/
total 60
drwxr-xr-x 3 root root 4096   Aug 25 12:28 ./
drwxr-xr-x 3 root root 4096   Aug 25 12:28 ../
-rw-r--r-- 1 root root 29228 Aug 25 12:28 classpath.txt
-rw-r--r-- 1 root root 21       Aug 25 12:28 __cloudera_generation__
-rw-r--r-- 1 root root 550     Aug 25 12:28 log4j.properties
-rw-r--r-- 1 root root 800     Aug 25 12:28 spark-defaults.conf
-rw-r--r-- 1 root root 1122   Aug 25 12:28 spark-env.sh
drwxr-xr-x 2 root root 4096   Aug 25 12:28 yarn-conf/

 

 

 

avatar
Super Collaborator

yes CM generates this as part of the gateway (client config). The classpath text file is generated by CM based on the dependencies that are defined in the deployment.

This is not something you can change.

 

As you can see in the upstream docs we use a form of hadoop free distribution but we still only test this with CDH and the specific dependencies.

 

Does that explain what you are lookign for?

 

WIlfred

avatar
Explorer

Thank you for the quick response, I really appreciate helping me clear my questions. 

 

The answer was exactly what I was looking for. It is automated and users cannot control the elements of classpath.txt file. 

 

Pardon my naive question, but can it pose a problem having different versions of same dependencies on classpath? 

 

Example:

 

09:39:34 $ cat /etc/spark/conf/classpath.txt | grep jersey-server
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p886.563/jars/jersey-server-1.9.jar
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p886.563/jars/jersey-server-1.14.jar

 

avatar
Super Collaborator

It should not pose a problem. If it does let us know but we have not seen an issue with this.

 

Wilfred

avatar
Explorer
Thank you! That definitely helps.

avatar
Explorer

Actually I think i got an issue related to the fact that classpath.txt contains multiple versions of the same jar:

 

The issue is related to this jira :  https://issues.apache.org/jira/browse/SPARK-8332

 

And on /etc/spark/conf/classpath.txt :

 

-----------------------------

cat /etc/spark/conf/classpath.txt | grep jackson

/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/jackson-annotations-2.2.3.jar
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/jackson-core-2.2.3.jar
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/jackson-databind-2.2.3.jar
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/jackson-annotations-2.3.0.jar
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/jackson-core-2.3.1.jar
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/jackson-databind-2.3.1.jar

-----------------------------

 

Somehow the classloader is pointing to the version 2.2.3 of jackson, where the method handledType() of the class BigDecimalDeserializer does not exist.

Similar errors may appears for jersey as well since the api changed a bit inbetween those versions.

 

Is that a way to solve this kind of issue in a proper way?

avatar
Rising Star

Hi andreF,

 

I have the similar issue, did you fix the issue?

avatar
Contributor
I understand this is older post but I am getting same problem. Can you please provide solution if it is resolved for you?

Thanks