Support Questions

Find answers, ask questions, and share your expertise

After upgrade from CDH5 to CDH6, spark no longer supplies hive jars. Adding the jars via spark.jars loads then with incorrect classloader.

avatar
New Contributor

We're in the process of upgrading from CDH 5 to CDH 6.3.2.

We have a management application written in java which uses SparkLauncher to start spark jobs on yarn.

We've updated libraries in both the management application and the spark applications from spark 1.x to 2.x

When a spark job is started on yarn, it fails due to missing Hive libraries.

Adding these libraries via SparkLauncher.addJar() results in the classes being unavailable from the desired classloader.

 

Code which adds libraries:

 

 

final String JAR_DIR = "/opt/cloudera/parcels/CDH/jars/";
File jarDir = new File(JAR_DIR);
for (String filename : jarDir.list()) {
String jar = JAR_DIR + filename;
log.info("adding jar: " + jar); 
launcher.addJar(jar);
}

 

 



Resulting exception:

 

 

2020-05-20 00:21:01 [,,,] [] INFO ApplicationMaster:57 - Final app status: FAILED, exitCode: 15, (reason: User class threw exception: com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException: org.apache.hive.hcatalog.common.HiveClientCache$ICacheableMetaStoreClient referenced from a method is not visible from class loader
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2234)
at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
at org.apache.hive.hcatalog.common.HiveClientCache.getOrCreate(HiveClientCache.java:229)
at org.apache.hive.hcatalog.common.HiveClientCache.get(HiveClientCache.java:204)
at org.apache.hive.hcatalog.common.HCatUtil.getHiveMetastoreClient(HCatUtil.java:563)
at org.apache.hive.hcatalog.api.HCatClientHMSImpl.initialize(HCatClientHMSImpl.java:823)
at org.apache.hive.hcatalog.api.HCatClient.create(HCatClient.java:73)
... frames from our code ...
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:673)
Caused by: java.lang.IllegalArgumentException: org.apache.hive.hcatalog.common.HiveClientCache$ICacheableMetaStoreClient referenced from a method is not visible from class loader
at java.base/java.lang.reflect.Proxy$ProxyBuilder.ensureVisible(Proxy.java:858)
at java.base/java.lang.reflect.Proxy$ProxyBuilder.validateProxyInterfaces(Proxy.java:681)
at java.base/java.lang.reflect.Proxy$ProxyBuilder.<init>(Proxy.java:627)
at java.base/java.lang.reflect.Proxy$ProxyBuilder.<init>(Proxy.java:635)
at java.base/java.lang.reflect.Proxy.lambda$getProxyConstructor$0(Proxy.java:415)
at java.base/jdk.internal.loader.AbstractClassLoaderValue$Memoizer.get(AbstractClassLoaderValue.java:329)
at java.base/jdk.internal.loader.AbstractClassLoaderValue.computeIfAbsent(AbstractClassLoaderValue.java:205)
at java.base/java.lang.reflect.Proxy.getProxyConstructor(Proxy.java:413)
at java.base/java.lang.reflect.Proxy.newProxyInstance(Proxy.java:1006)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:115)
at org.apache.hive.hcatalog.common.HiveClientCache$5.call(HiveClientCache.java:234)
at org.apache.hive.hcatalog.common.HiveClientCache$5.call(HiveClientCache.java:229)
at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)

 

 

 

We tried adding the jars in spark-defaults.conf via "Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf" in Cloudera Manager. It produced the same 'not visible from class loader' exception.
Line added to safety valve:

 

 

spark.jars=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-accumulo-handler-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-ant-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-beeline-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-classification-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-cli-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-common-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-contrib-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-exec-2.1.1-cdh6.3.2-core.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-exec-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hbase-handler-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hcatalog-core-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hcatalog-pig-adapter-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hcatalog-server-extensions-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hcatalog-streaming-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hplsql-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-jdbc-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-jdbc-2.1.1-cdh6.3.2-standalone.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-kryo-registrator-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-llap-client-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-llap-common-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-llap-ext-client-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-llap-server-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-llap-tez-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-metastore-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-orc-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-serde-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-service-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-service-rpc-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-shims-0.23-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-shims-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-shims-common-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-shims-scheduler-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-storage-api-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-testutils-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-webhcat-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-webhcat-java-client-2.1.1-cdh6.3.2.jar

 

 


Manually adding the jars to /etc/spark/conf/classpath.txt on the host running the aforementioned management application works. This doesn't appear to be a production stable solution as Cloudera Manager doesn't appear to provide a safe, persistent way to make this change.

 

Hacky script which manually adds jars to classpath.txt

 

 

set -e -x

# this script is idempotent

# hosts to update
hosts="$(mktemp)"
cat - >"${hosts}" <<EOF
foo.bar.baz.com
foo2.bar.baz.com
fooN.bar.baz.com
EOF

# jars to add
jars="$(mktemp)"
cat - >"${jars}" <<EOF
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-accumulo-handler-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-ant-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-beeline-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-classification-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-cli-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-common-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-contrib-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-exec-2.1.1-cdh6.3.2-core.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-exec-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hbase-handler-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hcatalog-core-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hcatalog-pig-adapter-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hcatalog-server-extensions-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hcatalog-streaming-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hplsql-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-jdbc-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-jdbc-2.1.1-cdh6.3.2-standalone.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-kryo-registrator-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-llap-client-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-llap-common-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-llap-ext-client-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-llap-server-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-llap-tez-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-metastore-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-orc-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-serde-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-service-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-service-rpc-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-shims-0.23-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-shims-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-shims-common-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-shims-scheduler-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-storage-api-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-testutils-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-webhcat-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-webhcat-java-client-2.1.1-cdh6.3.2.jar
EOF

while read host; do
  echo "host: ${host}"
  ssh-copy-id "foo@${host}"
  scp "${jars}" "foo@${host}:/tmp/jars" 
  ssh "foo@${host}" 'cat /etc/spark/conf/classpath.txt /tmp/jars | uniq | sort > /tmp/classpath.txt && sudo bash -c "cat /tmp/classpath.txt > /etc/spark/conf/classpath.txt"' </dev/null
done < "${hosts}"

 

 


How can we add these jars at runtime to the spark application such that they are accessible from the same classloader which loads our code in drivers and executors?

1 ACCEPTED SOLUTION

avatar
New Contributor

Adding the libs via the following pattern to the safety valve for spark-env.sh fixed the issue.

 

SPARK_DIST_CLASSPATH=/opt/cloudera/parcels/CDH/jars/hive-accumulo-handler-2.1.1-cdh6.3.2.jar:$SPARK_DIST_CLASSPATH

 

View solution in original post

1 REPLY 1

avatar
New Contributor

Adding the libs via the following pattern to the safety valve for spark-env.sh fixed the issue.

 

SPARK_DIST_CLASSPATH=/opt/cloudera/parcels/CDH/jars/hive-accumulo-handler-2.1.1-cdh6.3.2.jar:$SPARK_DIST_CLASSPATH