Reply
Highlighted
NT
New Contributor
Posts: 5
Registered: ‎07-29-2015
Accepted Solution

Spark distributed classpath

We have Spark installed via Cloudera Manager on a YARN cluster. It appears there is a classpath.txt file in /etc/spark/conf that include list of jars that should be available on spark's distributed classpath. And spark-env.sh seems to be the on that's exporting this configuration. 

 

It is my understanding that cloudera manager creates the classpath.txt file. I would like how does cloudera manger evaluate the list of jars that go into this file, and is it something that can be controlled through cloudera manager.

 

Thank you!

Cloudera Employee
Posts: 277
Registered: ‎01-16-2014

Re: Spark distributed classpath

For adding custom classes to the classpath you should use one of the two following options:
- add them via the command line options
- add them via the config

 

For the driver you have the option to use: --driver-class-path /path/to/file

 

Or for the the executor use

--conf "spark.executor.extraClassPath=/path/to/jar"


In spark-defaults.conf set the two values (or one if you only need it for one side
  spark.driver.extraClassPath
  spark.executor.extraClassPath

This can be done through the CM UI.

 

Depending on the exact thing you are doing you might see limitations of which option you can use.

 

Wilfred

NT
New Contributor
Posts: 5
Registered: ‎07-29-2015

Re: Spark distributed classpath

Thank you for your response Wilfred. It sure helps me. However, my question was more towards understanding how classpath.txt file mentioned below is created? Does CM create this file on all nodes, is it something we can configure through CM?

 

08:42:43 $ ll /etc/spark/conf/
total 60
drwxr-xr-x 3 root root 4096   Aug 25 12:28 ./
drwxr-xr-x 3 root root 4096   Aug 25 12:28 ../
-rw-r--r-- 1 root root 29228 Aug 25 12:28 classpath.txt
-rw-r--r-- 1 root root 21       Aug 25 12:28 __cloudera_generation__
-rw-r--r-- 1 root root 550     Aug 25 12:28 log4j.properties
-rw-r--r-- 1 root root 800     Aug 25 12:28 spark-defaults.conf
-rw-r--r-- 1 root root 1122   Aug 25 12:28 spark-env.sh
drwxr-xr-x 2 root root 4096   Aug 25 12:28 yarn-conf/

 

 

 

Cloudera Employee
Posts: 277
Registered: ‎01-16-2014

Re: Spark distributed classpath

yes CM generates this as part of the gateway (client config). The classpath text file is generated by CM based on the dependencies that are defined in the deployment.

This is not something you can change.

 

As you can see in the upstream docs we use a form of hadoop free distribution but we still only test this with CDH and the specific dependencies.

 

Does that explain what you are lookign for?

 

WIlfred

NT
New Contributor
Posts: 5
Registered: ‎07-29-2015

Re: Spark distributed classpath

Thank you for the quick response, I really appreciate helping me clear my questions. 

 

The answer was exactly what I was looking for. It is automated and users cannot control the elements of classpath.txt file. 

 

Pardon my naive question, but can it pose a problem having different versions of same dependencies on classpath? 

 

Example:

 

09:39:34 $ cat /etc/spark/conf/classpath.txt | grep jersey-server
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p886.563/jars/jersey-server-1.9.jar
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p886.563/jars/jersey-server-1.14.jar

 

Cloudera Employee
Posts: 277
Registered: ‎01-16-2014

Re: Spark distributed classpath

It should not pose a problem. If it does let us know but we have not seen an issue with this.

 

Wilfred

NT
New Contributor
Posts: 5
Registered: ‎07-29-2015

Re: Spark distributed classpath

Thank you! That definitely helps.
New Contributor
Posts: 4
Registered: ‎09-16-2015

Re: Spark distributed classpath

Actually I think i got an issue related to the fact that classpath.txt contains multiple versions of the same jar:

 

The issue is related to this jira :  https://issues.apache.org/jira/browse/SPARK-8332

 

And on /etc/spark/conf/classpath.txt :

 

-----------------------------

cat /etc/spark/conf/classpath.txt | grep jackson

/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/jackson-annotations-2.2.3.jar
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/jackson-core-2.2.3.jar
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/jackson-databind-2.2.3.jar
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/jackson-annotations-2.3.0.jar
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/jackson-core-2.3.1.jar
/opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/jars/jackson-databind-2.3.1.jar

-----------------------------

 

Somehow the classloader is pointing to the version 2.2.3 of jackson, where the method handledType() of the class BigDecimalDeserializer does not exist.

Similar errors may appears for jersey as well since the api changed a bit inbetween those versions.

 

Is that a way to solve this kind of issue in a proper way?

Contributor
Posts: 41
Registered: ‎02-23-2016

Re: Spark distributed classpath

Hi andreF,

 

I have the similar issue, did you fix the issue?

Contributor
Posts: 41
Registered: ‎02-23-2016

Re: Spark distributed classpath

Hi Wilfred,

 

 

I have the similar issue as andreF's, we have serval differnt guava in /etc/spark/conf/classpath.txtdo you know how to fix the issue?

 

/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/jars/guava-11.0.2.jar

/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/jars/guava-11.0.jar

/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/jars/guava-14.0.1.jar

/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/jars/guava-16.0.1.jar

 

Our app needs to use guava-16.0.1.jar, so I add guava-16.0.1.jar into /opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/jars/, and add "/opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/jars/guava-16.0.1.jar" into /etc/spark/conf/classpath.txt.

 

However, it doesn't work, spark action in oozie still can not find guava-16.0.1.jar. How does classpath.txt work? Do you know how to manage or modify the classpath.txt manually? Thanks!

 

 

 

Announcements