Support Questions

Find answers, ask questions, and share your expertise

Override libraries for spark

avatar
Explorer

 Hello,

I would like to use newer version of some of the libraries listed in /etc/spark/conf/classpath.txt.

What is the recommended way to do that? I add other libraries using spark-submit's --jars (I have the jars on HDFS), but

this does not work with newer versions of libraries that are already in classpath.txt.

Alternatively, is there a way to disable construction of classpath.txt and rely solely on libraries provided to the spark-submit (except spark and hadoop possibly)?

I'm running spark on yarn (cluster mode). 

 

Thank you!

1 ACCEPTED SOLUTION

avatar
Master Collaborator
I remember some problems with snappy and HBase like this, like somehow
an older version used by HBase ended up taking precedence in the app
classloader and then it could not quite load properly, as it couldn't
see the shared library in the parent classloader. This may be a
manifestation of that one. I know there are certainly cases where
there is no resolution to the conflict, since an app and Spark may use
mutually incompatible versions of a dependency, and one will mess with
the other if the Spark and app classloader are connected, no matter
what their ordering.

For this toy example, you'd just not set the classpath setting since
it isn't needed. For your app, if neither combination works, then your
options are probably to harmonize library versions with Spark, or
shade your copy of the library.


View solution in original post

10 REPLIES 10

avatar
New Contributor

we had an similiar problem running Accumulo 1.7.2 (parcel based) on CDH5. Unfortunately CDH5 bundles Accumul 1.6.0 jars by default.
 

Our workaround was to modify SPARK_DIST_CLASSPATH via

Spark Service Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh – Spark (Service-Wide)

SPARK_DIST_CLASSPATH=/opt/cloudera/parcels/ACCUMULO/lib/accumulo/lib/accumulo-core.jar:$SPARK_DIST_CLASSPATH
SPARK_DIST_CLASSPATH=/opt/cloudera/parcels/ACCUMULO/lib/accumulo/lib/accumulo-fate.jar:$SPARK_DIST_CLASSPATH
SPARK_DIST_CLASSPATH=/opt/cloudera/parcels/ACCUMULO/lib/accumulo/lib/accumulo-start.jar:$SPARK_DIST_CLASSPATH
SPARK_DIST_CLASSPATH=/opt/cloudera/parcels/ACCUMULO/lib/accumulo/lib/accumulo-trace.jar:$SPARK_DIST_CLASSPATH
export SPARK_DIST_CLASSPATH

This way you can add or redefine SPARK_DIST_CLASSPATH