New Contributor
Posts: 2
Registered: ‎04-23-2018

Protobuf JAR dependency and CoreNLP

[ Edited ]



We are trying to run a Spark 2.2.0 application using Standford CoreNLP version 3.9.1. The core NLP version requires proto buffer java version 3.5.1 but there a proto buffer java 2.5.0 version on CDH classpath and my application fails (similar to As a workaround, I set `spark.executor.userClassPathFirst=true` and have the required proto buffer java version as user supplied JAR file with --jars option to Spark. This seems to be working, but my parquet file write fails with 

Caused by: java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.uncompressedLength(Ljava/nio/ByteBuffer;II)I
	at org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
	at org.xerial.snappy.Snappy.uncompressedLength(
	at parquet.hadoop.codec.SnappyDecompressor.decompress(
	at parquet.bytes.BytesInput$StreamBytesInput.toByteArray(
	at parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.<init>(
	at parquet.column.Encoding$1.initDictionary(
	at parquet.column.Encoding$4.initDictionary(

above error. (Similar to Is there a workaround/solution for the problem. In summary


1. Is there any other way to overwrite the proto buffer version in Spark2 application run via Spark Submit on YARN?

2. Is there workaround / solution for snappy native link error? BTW the parquet write does work on the cluster if I don't have 'spark.executor.userClassPathFirst=true' in a Spark job (for a separate Spark application)

We are running CDH5.13.1