- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Override libraries for spark
- Labels:
-
Apache Hadoop
-
Apache Spark
-
Apache YARN
-
HDFS
Created on ‎09-22-2015 12:36 AM - edited ‎09-16-2022 02:41 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I would like to use newer version of some of the libraries listed in /etc/spark/conf/classpath.txt.
What is the recommended way to do that? I add other libraries using spark-submit's --jars (I have the jars on HDFS), but
this does not work with newer versions of libraries that are already in classpath.txt.
Alternatively, is there a way to disable construction of classpath.txt and rely solely on libraries provided to the spark-submit (except spark and hadoop possibly)?
I'm running spark on yarn (cluster mode).
Thank you!
Created ‎09-22-2015 06:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
an older version used by HBase ended up taking precedence in the app
classloader and then it could not quite load properly, as it couldn't
see the shared library in the parent classloader. This may be a
manifestation of that one. I know there are certainly cases where
there is no resolution to the conflict, since an app and Spark may use
mutually incompatible versions of a dependency, and one will mess with
the other if the Spark and app classloader are connected, no matter
what their ordering.
For this toy example, you'd just not set the classpath setting since
it isn't needed. For your app, if neither combination works, then your
options are probably to harmonize library versions with Spark, or
shade your copy of the library.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
we had an similiar problem running Accumulo 1.7.2 (parcel based) on CDH5. Unfortunately CDH5 bundles Accumul 1.6.0 jars by default.
Our workaround was to modify SPARK_DIST_CLASSPATH via
Spark Service Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh – Spark (Service-Wide)
SPARK_DIST_CLASSPATH=/opt/cloudera/parcels/ACCUMULO/lib/accumulo/lib/accumulo-core.jar:$SPARK_DIST_CLASSPATH SPARK_DIST_CLASSPATH=/opt/cloudera/parcels/ACCUMULO/lib/accumulo/lib/accumulo-fate.jar:$SPARK_DIST_CLASSPATH SPARK_DIST_CLASSPATH=/opt/cloudera/parcels/ACCUMULO/lib/accumulo/lib/accumulo-start.jar:$SPARK_DIST_CLASSPATH SPARK_DIST_CLASSPATH=/opt/cloudera/parcels/ACCUMULO/lib/accumulo/lib/accumulo-trace.jar:$SPARK_DIST_CLASSPATH export SPARK_DIST_CLASSPATH
This way you can add or redefine SPARK_DIST_CLASSPATH

- « Previous
-
- 1
- 2
- Next »