Created on 06-16-2020 02:58 PM - edited 09-16-2022 07:37 AM
We're in the process of upgrading from CDH 5 to CDH 6.3.2.
We have a management application written in java which uses SparkLauncher to start spark jobs on yarn.
We've updated libraries in both the management application and the spark applications from spark 1.x to 2.x
When a spark job is started on yarn, it fails due to missing Hive libraries.
Adding these libraries via SparkLauncher.addJar() results in the classes being unavailable from the desired classloader.
Code which adds libraries:
final String JAR_DIR = "/opt/cloudera/parcels/CDH/jars/";
File jarDir = new File(JAR_DIR);
for (String filename : jarDir.list()) {
String jar = JAR_DIR + filename;
log.info("adding jar: " + jar);
launcher.addJar(jar);
}
Resulting exception:
2020-05-20 00:21:01 [,,,] [] INFO ApplicationMaster:57 - Final app status: FAILED, exitCode: 15, (reason: User class threw exception: com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException: org.apache.hive.hcatalog.common.HiveClientCache$ICacheableMetaStoreClient referenced from a method is not visible from class loader
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2234)
at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
at org.apache.hive.hcatalog.common.HiveClientCache.getOrCreate(HiveClientCache.java:229)
at org.apache.hive.hcatalog.common.HiveClientCache.get(HiveClientCache.java:204)
at org.apache.hive.hcatalog.common.HCatUtil.getHiveMetastoreClient(HCatUtil.java:563)
at org.apache.hive.hcatalog.api.HCatClientHMSImpl.initialize(HCatClientHMSImpl.java:823)
at org.apache.hive.hcatalog.api.HCatClient.create(HCatClient.java:73)
... frames from our code ...
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:673)
Caused by: java.lang.IllegalArgumentException: org.apache.hive.hcatalog.common.HiveClientCache$ICacheableMetaStoreClient referenced from a method is not visible from class loader
at java.base/java.lang.reflect.Proxy$ProxyBuilder.ensureVisible(Proxy.java:858)
at java.base/java.lang.reflect.Proxy$ProxyBuilder.validateProxyInterfaces(Proxy.java:681)
at java.base/java.lang.reflect.Proxy$ProxyBuilder.<init>(Proxy.java:627)
at java.base/java.lang.reflect.Proxy$ProxyBuilder.<init>(Proxy.java:635)
at java.base/java.lang.reflect.Proxy.lambda$getProxyConstructor$0(Proxy.java:415)
at java.base/jdk.internal.loader.AbstractClassLoaderValue$Memoizer.get(AbstractClassLoaderValue.java:329)
at java.base/jdk.internal.loader.AbstractClassLoaderValue.computeIfAbsent(AbstractClassLoaderValue.java:205)
at java.base/java.lang.reflect.Proxy.getProxyConstructor(Proxy.java:413)
at java.base/java.lang.reflect.Proxy.newProxyInstance(Proxy.java:1006)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:115)
at org.apache.hive.hcatalog.common.HiveClientCache$5.call(HiveClientCache.java:234)
at org.apache.hive.hcatalog.common.HiveClientCache$5.call(HiveClientCache.java:229)
at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
We tried adding the jars in spark-defaults.conf via "Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf" in Cloudera Manager. It produced the same 'not visible from class loader' exception.
Line added to safety valve:
spark.jars=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-accumulo-handler-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-ant-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-beeline-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-classification-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-cli-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-common-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-contrib-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-exec-2.1.1-cdh6.3.2-core.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-exec-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hbase-handler-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hcatalog-core-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hcatalog-pig-adapter-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hcatalog-server-extensions-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hcatalog-streaming-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hplsql-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-jdbc-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-jdbc-2.1.1-cdh6.3.2-standalone.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-kryo-registrator-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-llap-client-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-llap-common-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-llap-ext-client-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-llap-server-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-llap-tez-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-metastore-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-orc-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-serde-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-service-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-service-rpc-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-shims-0.23-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-shims-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-shims-common-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-shims-scheduler-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-storage-api-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-testutils-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-webhcat-2.1.1-cdh6.3.2.jar,/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-webhcat-java-client-2.1.1-cdh6.3.2.jar
Manually adding the jars to /etc/spark/conf/classpath.txt on the host running the aforementioned management application works. This doesn't appear to be a production stable solution as Cloudera Manager doesn't appear to provide a safe, persistent way to make this change.
Hacky script which manually adds jars to classpath.txt
set -e -x
# this script is idempotent
# hosts to update
hosts="$(mktemp)"
cat - >"${hosts}" <<EOF
foo.bar.baz.com
foo2.bar.baz.com
fooN.bar.baz.com
EOF
# jars to add
jars="$(mktemp)"
cat - >"${jars}" <<EOF
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-accumulo-handler-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-ant-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-beeline-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-classification-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-cli-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-common-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-contrib-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-exec-2.1.1-cdh6.3.2-core.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-exec-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hbase-handler-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hcatalog-core-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hcatalog-pig-adapter-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hcatalog-server-extensions-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hcatalog-streaming-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-hplsql-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-jdbc-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-jdbc-2.1.1-cdh6.3.2-standalone.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-kryo-registrator-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-llap-client-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-llap-common-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-llap-ext-client-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-llap-server-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-llap-tez-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-metastore-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-orc-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-serde-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-service-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-service-rpc-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-shims-0.23-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-shims-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-shims-common-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-shims-scheduler-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-storage-api-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-testutils-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-webhcat-2.1.1-cdh6.3.2.jar
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-webhcat-java-client-2.1.1-cdh6.3.2.jar
EOF
while read host; do
echo "host: ${host}"
ssh-copy-id "foo@${host}"
scp "${jars}" "foo@${host}:/tmp/jars"
ssh "foo@${host}" 'cat /etc/spark/conf/classpath.txt /tmp/jars | uniq | sort > /tmp/classpath.txt && sudo bash -c "cat /tmp/classpath.txt > /etc/spark/conf/classpath.txt"' </dev/null
done < "${hosts}"
How can we add these jars at runtime to the spark application such that they are accessible from the same classloader which loads our code in drivers and executors?
Created on 06-19-2020 08:16 AM - edited 06-19-2020 08:16 AM
Adding the libs via the following pattern to the safety valve for spark-env.sh fixed the issue.
SPARK_DIST_CLASSPATH=/opt/cloudera/parcels/CDH/jars/hive-accumulo-handler-2.1.1-cdh6.3.2.jar:$SPARK_DIST_CLASSPATH
Created on 06-19-2020 08:16 AM - edited 06-19-2020 08:16 AM
Adding the libs via the following pattern to the safety valve for spark-env.sh fixed the issue.
SPARK_DIST_CLASSPATH=/opt/cloudera/parcels/CDH/jars/hive-accumulo-handler-2.1.1-cdh6.3.2.jar:$SPARK_DIST_CLASSPATH