Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Protobuf JAR dependency and CoreNLP


Protobuf JAR dependency and CoreNLP

New Contributor



We are trying to run a Spark 2.2.0 application using Standford CoreNLP version 3.9.1. The core NLP version requires proto buffer java version 3.5.1 but there a proto buffer java 2.5.0 version on CDH classpath and my application fails (similar to As a workaround, I set `spark.executor.userClassPathFirst=true` and have the required proto buffer java version as user supplied JAR file with --jars option to Spark. This seems to be working, but my parquet file write fails with 

Caused by: java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.uncompressedLength(Ljava/nio/ByteBuffer;II)I
	at org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
	at org.xerial.snappy.Snappy.uncompressedLength(
	at parquet.hadoop.codec.SnappyDecompressor.decompress(
	at parquet.bytes.BytesInput$StreamBytesInput.toByteArray(
	at parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.<init>(
	at parquet.column.Encoding$1.initDictionary(
	at parquet.column.Encoding$4.initDictionary(

above error. (Similar to Is there a workaround/solution for the problem. In summary


1. Is there any other way to overwrite the proto buffer version in Spark2 application run via Spark Submit on YARN?

2. Is there workaround / solution for snappy native link error? BTW the parquet write does work on the cluster if I don't have 'spark.executor.userClassPathFirst=true' in a Spark job (for a separate Spark application)

We are running CDH5.13.1