Reply
Highlighted
New Contributor
Posts: 3
Registered: ‎11-17-2016

Spark 2.x + Tika: java.lang.NoSuchMethodError: org.apache.commons.compress.archivers.ArchiveStreamF

[ Edited ]

Hi,

 

I am trying to resolve a spark-submit classpath runtime issue for an Apache Tika (>v 1.14) parsing job.  The problem seems to involve spark-submit classpath vs my uber-jar. 

 

Platforms: CDH 5.15 (Spark 2.3 added via CDH docs) and CDH 6 (Spark 2.2 bundled in CDH 6)

 

I've tried / reviewed:

(Cloudera) Where does spark-submit look for Jar files?

(stackoverflow) resolving-dependency-problems-in-apache-spark

 

Highlights:

  • Java 8 / Scala 2.11
  • I'm building an uber-jar and calling that uber-jar via spark-submit
  • I've tried adding --jars option to spark-submit call (see further down in this post)
  • I've tried adding --conf spark.driver.userClassPathFirst=true && --conf spark.executor.userClassPathFirst=true to spark-submit call (see further down in this post):
$ spark-submit --master local[*] --class com.example.App --conf spark.executor.userClassPathFirst=true ./target/uber-tikaTest-1.19.jar

18/09/25 13:35:55 ERROR util.Utils: Exception encountered java.lang.NullPointerException at org.apache.spark.rdd.ParallelCollectionPartition$$anonfun$readObject$1.apply$mcV$sp(ParallelCollectionRDD.scala:72) at org.apache.spark.rdd.ParallelCollectionPartition$$anonfun$readObject$1.apply(ParallelCollectionRDD.scala:70) at org.apache.spark.rdd.ParallelCollectionPartition$$anonfun$readObject$1.apply(ParallelCollectionRDD.scala:70) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1307) at org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2136) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:312) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 18/09/25 13:35:55 ERROR util.Utils: Exception encountered java.lang.NullPointerException at org.apache.spark.rdd.ParallelCollectionPartition$$anonfun$readObject$1.apply$mcV$sp(ParallelCollectionRDD.scala:72) at org.apache.spark.rdd.ParallelCollectionPartition$$anonfun$readObject$1.apply(ParallelCollectionRDD.scala:70) at org.apache.spark.rdd.ParallelCollectionPartition$$anonfun$readObject$1.apply(ParallelCollectionRDD.scala:70) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1307) at org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2136) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:312) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

 

Below the following error message are files for:

  • build-and-run.sh script (calls spark-submit -- notes about options included)
  • sample app
  • pom.xml
  • mvn dependency tree output (which shows the "missing" commons-compress library is included within the uber-jar)

 

The error at runtime:

18/09/25 11:47:39 ERROR executor.Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.NoSuchMethodError: org.apache.commons.compress.archivers.ArchiveStreamFactory.detect(Ljava/io/InputStream;)Ljava/lang/String;
	at org.apache.tika.parser.pkg.ZipContainerDetector.detectArchiveFormat(ZipContainerDetector.java:160)
	at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:104)
	at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:159)
	at com.example.App$.tikaAutoDetectParser(App.scala:55)
	at com.example.App$$anonfun$1.apply(App.scala:69)
	at com.example.App$$anonfun$1.apply(App.scala:69)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1799)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1158)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1158)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2071)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2071)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:109)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
18/09/25 11:47:39 ERROR executor.Executor: Exception in task 5.0 in stage 0.0 (TID 5)
java.lang.NoSuchMethodError: org.apache.commons.compress.archivers.ArchiveStreamFactory.detect(Ljava/io/InputStream;)Ljava/lang/String;
	at org.apache.tika.parser.pkg.ZipContainerDetector.detectArchiveFormat(ZipContainerDetector.java:160)
	at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:104)
	at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:159)
	at com.example.App$.tikaAutoDetectParser(App.scala:55)
	at com.example.App$$anonfun$1.apply(App.scala:69)
	at com.example.App$$anonfun$1.apply(App.scala:69)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1799)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1158)
	at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1158)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2071)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2071)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:109)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

 

 

build-and-run.sh:

Notes:

  • I've tried adding the --conf flags for userClassPathFirst in both master and yarn configs below,
  • using the --jar flag to specify the uber-jar generated from mvn compile with the pom.xml (provided further down in the post)

 

#!/bin/bash
mvn compile

if true
then
spark-submit --master local[*] --class com.example.App ./target/uber-tikaTest-1.19.jar
fi

# tried the using the userClass flags for driver and executor for above and below calls to spark-submit
# --conf spark.driver.userClassPathFirst=true \
# --conf spark.executor.userClassPathFirst=true \

if false 
then
spark-submit --class com.example.App \
 --master yarn \
 --packages org.apache.commons:commons-compress:1.18 \
 --jars ./target/uber-tikaTest-1.19.jar \
 --num-executors 2 \
 --executor-memory 1024m \
 --executor-cores 2 \
 --driver-memory 2048m \
 --driver-cores 1 \
 ./target/uber-tikaTest-1.19.jar
fi

 

Sample App:

 

package com.example
////////// Tika Imports
import org.apache.tika.metadata.Metadata
import org.apache.tika.parser.AutoDetectParser
import org.apache.tika.sax.BodyContentHandler
////////// Java HTTP Imports 
import java.net.URL;
import java.net.HttpURLConnection
import scala.collection.JavaConverters._
import scala.collection.mutable._
////////// Spark Imports 
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.storage.StorageLevel
import org.apache.spark.sql.{Row,SparkSession}


object App {
  case class InputStreamData(sourceURL: String, headerFields: Map[String,List[String]], inputStream: java.io.InputStream)
    
  def openUrlStream(sourceURL:String,apiKey:String):(InputStreamData) = {
    try {
  	 val url = new URL(sourceURL)
         val urlConnection = url.openConnection().asInstanceOf[HttpURLConnection] 
  	 urlConnection.setInstanceFollowRedirects(true)
         val headerFields = urlConnection.getHeaderFields()
         val input = urlConnection.getInputStream()
 	 InputStreamData(sourceURL, headerFields.asScala.map(x => (x._1,x._2.asScala.toList)), input)
    }
      catch {
      case e: Exception => {
        println("**********************************************************************************************")
        println("PARSEURL: INVALID URL: " + sourceURL)
        println(e.toString())
        println("**********************************************************************************************")
       
        InputStreamData(sourceURL, Map("ERROR" -> List("ERROR")), null)
      }
    }
  }
 
  def tikaAutoDetectParser(inputStream:java.io.InputStream):String = {
    var parser = new AutoDetectParser();
    var handler = new BodyContentHandler(-1);
    var metadata = new Metadata();
    parser.parse(inputStream, handler, metadata);
    return handler.toString()
  }

  def main(args : Array[String]) {
    var sparkConf = new SparkConf().setAppName("tika-1.19-test")
    val sc = new SparkContext(sparkConf) 
    val spark = SparkSession.builder.config(sparkConf).getOrCreate()
    println("HELLO!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
    var urls = List("http://www.pdf995.com/samples/pdf.pdf", "https://www.amd.com/en", "http://jeroen.github.io/images/testocr.png")
          
    var rdd = sc.parallelize(urls)
    var parsed = rdd.map(x => tikaAutoDetectParser(openUrlStream(x,"").inputStream))
    println(parsed.count)
  }
}

 

 

pom.xml (builds uber-jar):

 

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.example</groupId>
  <artifactId>tikaTest</artifactId>
  <version>1.19</version>
  <name>${project.artifactId}</name>
  <description>Testing tika 1.19 with CDH 6 and 5.x, Spark 2.x, Scala 2.11.x</description>
  <inceptionYear>2018</inceptionYear>
  <licenses>
    <license>
      <name>My License</name>
      <url>http://....</url>
      <distribution>repo</distribution>
    </license>
  </licenses>


 <repositories>
    <repository>
      <id>cloudera</id>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
  </repositories>

<profiles>
	<profile>
		<id>scala-2.11.12</id>
		<activation>
			<activeByDefault>true</activeByDefault>
		</activation>
		<properties>
			<scalaVersion>2.11.12</scalaVersion>
			<scalaBinaryVersion>2.11.12</scalaBinaryVersion>
		</properties>
		<dependencies>
			<!-- ************************************************************************** -->
			<!-- GOOD DEPENDENCIES +++++++++++++++++++++++++++++++++++++ -->
			<!-- ************************************************************************** -->

			<!-- https://mvnrepository.com/artifact/org.apache.commons/commons-compress -->
			<dependency>
   				<groupId>org.apache.commons</groupId>
   				<artifactId>commons-compress</artifactId>
   				<version>1.18</version>
			</dependency>
						
			<!-- *************** CDH flavored dependencies ***********************************************-->
			<!-- https://www.cloudera.com/documentation/spark2/latest/topics/spark2_packaging.html#versions -->
			<dependency>
   				<groupId>org.apache.spark</groupId>
   				<artifactId>spark-core_2.11</artifactId>
   				<version>2.2.0.cloudera3</version>
  				<!-- have tried scope provided / compile -->
   				<!--<scope>provided</scope>-->
			</dependency>
			<dependency>
    				<groupId>org.apache.spark</groupId>
    				<artifactId>spark-sql_2.11</artifactId>
    				<version>2.2.0.cloudera3</version>
    				<!-- have tried scope provided / compile -->
    				<!--<scope>provided</scope>-->
				</dependency>
												
				<!-- https://mvnrepository.com/artifact/org.apache.tika/tika-core -->
				<dependency>
				    <groupId>org.apache.tika</groupId>
    				<artifactId>tika-core</artifactId>
    				<version>1.19</version>
				</dependency>
				
				<!-- https://mvnrepository.com/artifact/org.apache.tika/tika-parsers -->
				<dependency>
					<groupId>org.apache.tika</groupId>
					<artifactId>tika-parsers</artifactId>
					<version>1.19</version>
				</dependency>
				
				<!-- https://mvnrepository.com/artifact/javax.ws.rs/javax.ws.rs-api -->
				<dependency>
    				<groupId>javax.ws.rs</groupId>
    				<artifactId>javax.ws.rs-api</artifactId>
    				<version>2.1.1</version>
    			</dependency>
        
			<!-- https://mvnrepository.com/artifact/org.scala-lang/scala-library -->
				<dependency>
    				<groupId>org.scala-lang</groupId>
    				<artifactId>scala-library</artifactId>
    				<version>2.11.12</version>
				</dependency>
				
			
			<!-- **************************************************************************************************************************
			**************************** alternative dependencies that have been tried and yield same Tika error***************************
			*******************************************************************************************************************************-->
			<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
				<!--
				<dependency>
    				<groupId>org.apache.spark</groupId>
    				<artifactId>spark-core_2.11</artifactId>
    				<version>2.2.0</version>
				</dependency>
				-->
				
			<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
				<!--
				<dependency>
    				<groupId>org.apache.spark</groupId>
    				<artifactId>spark-sql_2.11</artifactId>
    				<version>2.2.0</version>
				</dependency>
				-->
			
			</dependencies>
		</profile>
	</profiles>

  
	<build>
		<sourceDirectory>src/main/scala</sourceDirectory>
		<plugins>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-compiler-plugin</artifactId>
				<version>3.5.1</version>
				<configuration>
					<source>1.8</source>
					<target>1.8</target>
				</configuration>
			</plugin>
			<plugin>
				<groupId>net.alchim31.maven</groupId>
				<artifactId>scala-maven-plugin</artifactId>
				<version>3.2.2</version>
				<executions>
					<execution>
						<goals>
							<goal>compile</goal>
							<goal>testCompile</goal>
						</goals>
					</execution>
				</executions>
				<configuration>
					<args>
						<!-- work-around for https://issues.scala-lang.org/browse/SI-8358 -->
						<arg>-nobootcp</arg>
					</args>
				</configuration>
			</plugin>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-shade-plugin</artifactId>
				<version>3.1.1</version>
				<executions>
					<execution>
						<phase>package</phase>
						<goals>
							<goal>shade</goal>
						</goals>
					</execution>
				</executions>
				<configuration>
					<filters>
						<filter>
							<artifact>*:*</artifact>
							<excludes>
								<exclude>META-INF/*.SF</exclude>
								<exclude>META-INF/*.DSA</exclude>
								<exclude>META-INF/*.RSA</exclude>
							</excludes>
						</filter>
					</filters>
					<finalName>uber-${project.artifactId}-${project.version}</finalName>
				</configuration>
			</plugin>
		</plugins>
	</build>
</project>

 

mvn dependency tree:

Notes:

  • $ mvn dependency:tree -Ddetail=true | grep compress

 

[INFO] +- org.apache.commons:commons-compress:jar:1.18:compile
[INFO] | +- com.ning:compress-lzf:jar:1.0.3:compile

 

 

  • $ mvn dependency:tree -Ddetail=true | grep commons

 

[INFO] +- org.apache.commons:commons-compress:jar:1.18:compile
[INFO] | | | \- commons-collections:commons-collections:jar:3.2.2:compile
[INFO] | | | +- commons-cli:commons-cli:jar:1.2:compile
[INFO] | | | +- commons-httpclient:commons-httpclient:jar:3.1:compile
[INFO] | | | +- commons-configuration:commons-configuration:jar:1.6:compile
[INFO] | | | | +- commons-digester:commons-digester:jar:1.8:compile
[INFO] | | | | | \- commons-beanutils:commons-beanutils:jar:1.7.0:compile
[INFO] | | | | \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile
[INFO] | +- org.apache.commons:commons-lang3:jar:3.5:compile
[INFO] | +- org.apache.commons:commons-math3:jar:3.4.1:compile
[INFO] | +- commons-net:commons-net:jar:2.2:compile
[INFO] | +- org.apache.commons:commons-crypto:jar:1.0.0:compile
[INFO] | | +- org.codehaus.janino:commons-compiler:jar:3.0.8:compile
[INFO] | | \- commons-lang:commons-lang:jar:2.6:compile
[INFO] | +- commons-codec:commons-codec:jar:1.11:compile
[INFO] | | \- org.apache.commons:commons-collections4:jar:4.2:compile
[INFO] | +- org.apache.commons:commons-exec:jar:1.3:compile
[INFO] | +- commons-io:commons-io:jar:2.6:compile
[INFO] | +- org.apache.commons:commons-csv:jar:1.5:compile

 

 

  • Full listing:

 

$ mvn dependency:tree -Ddetail=true

[INFO] com.example:tikaTest:jar:1.19 [INFO] +- org.apache.commons:commons-compress:jar:1.18:compile [INFO] +- org.apache.spark:spark-core_2.11:jar:2.2.0.cloudera3:compile [INFO] | +- org.apache.avro:avro:jar:1.7.6-cdh5.13.3:compile [INFO] | | +- org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile [INFO] | | +- org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8:compile [INFO] | | \- com.thoughtworks.paranamer:paranamer:jar:2.3:compile [INFO] | +- org.apache.avro:avro-mapred:jar:hadoop2:1.7.6-cdh5.13.3:compile [INFO] | | +- org.apache.avro:avro-ipc:jar:1.7.6-cdh5.13.3:compile [INFO] | | | \- commons-collections:commons-collections:jar:3.2.2:compile [INFO] | | \- org.apache.avro:avro-ipc:jar:tests:1.7.6-cdh5.13.3:compile [INFO] | +- com.twitter:chill_2.11:jar:0.8.0:compile [INFO] | | \- com.esotericsoftware:kryo-shaded:jar:3.0.3:compile [INFO] | | +- com.esotericsoftware:minlog:jar:1.3.0:compile [INFO] | | \- org.objenesis:objenesis:jar:2.1:compile [INFO] | +- com.twitter:chill-java:jar:0.8.0:compile [INFO] | +- org.apache.xbean:xbean-asm5-shaded:jar:4.4:compile [INFO] | +- org.apache.hadoop:hadoop-client:jar:2.6.0-cdh5.13.3:compile [INFO] | | +- org.apache.hadoop:hadoop-common:jar:2.6.0-cdh5.13.3:compile [INFO] | | | +- commons-cli:commons-cli:jar:1.2:compile [INFO] | | | +- xmlenc:xmlenc:jar:0.52:compile [INFO] | | | +- commons-httpclient:commons-httpclient:jar:3.1:compile [INFO] | | | +- commons-configuration:commons-configuration:jar:1.6:compile [INFO] | | | | +- commons-digester:commons-digester:jar:1.8:compile [INFO] | | | | | \- commons-beanutils:commons-beanutils:jar:1.7.0:compile [INFO] | | | | \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile [INFO] | | | +- org.apache.hadoop:hadoop-auth:jar:2.6.0-cdh5.13.3:compile [INFO] | | | | \- org.apache.directory.server:apacheds-kerberos-codec:jar:2.0.0-M15:compile [INFO] | | | | +- org.apache.directory.server:apacheds-i18n:jar:2.0.0-M15:compile [INFO] | | | | +- org.apache.directory.api:api-asn1-api:jar:1.0.0-M20:compile [INFO] | | | | \- org.apache.directory.api:api-util:jar:1.0.0-M20:compile [INFO] | | | +- org.apache.curator:curator-client:jar:2.7.1:compile [INFO] | | | \- org.apache.htrace:htrace-core4:jar:4.0.1-incubating:compile [INFO] | | +- org.apache.hadoop:hadoop-hdfs:jar:2.6.0-cdh5.13.3:compile [INFO] | | | +- org.mortbay.jetty:jetty-util:jar:6.1.26.cloudera.4:compile [INFO] | | | \- xerces:xercesImpl:jar:2.9.1:compile [INFO] | | | \- xml-apis:xml-apis:jar:1.3.04:compile [INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.6.0-cdh5.13.3:compile [INFO] | | | +- org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.6.0-cdh5.13.3:compile [INFO] | | | | +- org.apache.hadoop:hadoop-yarn-client:jar:2.6.0-cdh5.13.3:compile [INFO] | | | | \- org.apache.hadoop:hadoop-yarn-server-common:jar:2.6.0-cdh5.13.3:compile [INFO] | | | \- org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.6.0-cdh5.13.3:compile [INFO] | | +- org.apache.hadoop:hadoop-yarn-api:jar:2.6.0-cdh5.13.3:compile [INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.6.0-cdh5.13.3:compile [INFO] | | | \- org.apache.hadoop:hadoop-yarn-common:jar:2.6.0-cdh5.13.3:compile [INFO] | | | +- org.codehaus.jackson:jackson-jaxrs:jar:1.8.8:compile [INFO] | | | \- org.codehaus.jackson:jackson-xc:jar:1.8.8:compile [INFO] | | +- org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.6.0-cdh5.13.3:compile [INFO] | | +- org.apache.hadoop:hadoop-aws:jar:2.6.0-cdh5.13.3:compile [INFO] | | | \- com.amazonaws:aws-java-sdk-bundle:jar:1.11.134:compile [INFO] | | \- org.apache.hadoop:hadoop-annotations:jar:2.6.0-cdh5.13.3:compile [INFO] | +- org.apache.spark:spark-launcher_2.11:jar:2.2.0.cloudera3:compile [INFO] | +- org.apache.spark:spark-network-common_2.11:jar:2.2.0.cloudera3:compile [INFO] | | \- org.fusesource.leveldbjni:leveldbjni-all:jar:1.8:compile [INFO] | +- org.apache.spark:spark-network-shuffle_2.11:jar:2.2.0.cloudera3:compile [INFO] | +- org.apache.spark:spark-unsafe_2.11:jar:2.2.0.cloudera3:compile [INFO] | +- net.java.dev.jets3t:jets3t:jar:0.9.0:compile [INFO] | | +- org.apache.httpcomponents:httpcore:jar:4.1.2:compile [INFO] | | \- com.jamesmurty.utils:java-xmlbuilder:jar:0.4:compile [INFO] | +- org.apache.curator:curator-recipes:jar:2.7.1:compile [INFO] | | +- org.apache.curator:curator-framework:jar:2.7.1:compile [INFO] | | +- org.apache.zookeeper:zookeeper:jar:3.4.6:compile [INFO] | | \- com.google.guava:guava:jar:16.0.1:compile [INFO] | +- javax.servlet:javax.servlet-api:jar:3.1.0:compile [INFO] | +- org.apache.commons:commons-lang3:jar:3.5:compile [INFO] | +- org.apache.commons:commons-math3:jar:3.4.1:compile [INFO] | +- com.google.code.findbugs:jsr305:jar:1.3.9:compile [INFO] | +- org.slf4j:slf4j-api:jar:1.7.5:compile [INFO] | +- org.slf4j:jul-to-slf4j:jar:1.7.5:compile [INFO] | +- org.slf4j:jcl-over-slf4j:jar:1.7.5:compile [INFO] | +- log4j:log4j:jar:1.2.17:compile [INFO] | +- org.slf4j:slf4j-log4j12:jar:1.7.5:compile [INFO] | +- com.ning:compress-lzf:jar:1.0.3:compile [INFO] | +- org.xerial.snappy:snappy-java:jar:1.0.4.1:compile [INFO] | +- net.jpountz.lz4:lz4:jar:1.3.0:compile [INFO] | +- org.roaringbitmap:RoaringBitmap:jar:0.5.11:compile [INFO] | +- commons-net:commons-net:jar:2.2:compile [INFO] | +- org.json4s:json4s-jackson_2.11:jar:3.2.11:compile [INFO] | | \- org.json4s:json4s-core_2.11:jar:3.2.11:compile [INFO] | | +- org.json4s:json4s-ast_2.11:jar:3.2.11:compile [INFO] | | \- org.scala-lang:scalap:jar:2.11.0:compile [INFO] | | \- org.scala-lang:scala-compiler:jar:2.11.0:compile [INFO] | | +- org.scala-lang.modules:scala-xml_2.11:jar:1.0.1:compile [INFO] | | \- org.scala-lang.modules:scala-parser-combinators_2.11:jar:1.0.1:compile [INFO] | +- org.glassfish.jersey.core:jersey-client:jar:2.22.2:compile [INFO] | | +- org.glassfish.hk2:hk2-api:jar:2.4.0-b34:compile [INFO] | | | +- org.glassfish.hk2:hk2-utils:jar:2.4.0-b34:compile [INFO] | | | \- org.glassfish.hk2.external:aopalliance-repackaged:jar:2.4.0-b34:compile [INFO] | | +- org.glassfish.hk2.external:javax.inject:jar:2.4.0-b34:compile [INFO] | | \- org.glassfish.hk2:hk2-locator:jar:2.4.0-b34:compile [INFO] | | \- org.javassist:javassist:jar:3.18.1-GA:compile [INFO] | +- org.glassfish.jersey.core:jersey-common:jar:2.22.2:compile [INFO] | | +- javax.annotation:javax.annotation-api:jar:1.2:compile [INFO] | | +- org.glassfish.jersey.bundles.repackaged:jersey-guava:jar:2.22.2:compile [INFO] | | \- org.glassfish.hk2:osgi-resource-locator:jar:1.0.1:compile [INFO] | +- org.glassfish.jersey.core:jersey-server:jar:2.22.2:compile [INFO] | | +- org.glassfish.jersey.media:jersey-media-jaxb:jar:2.22.2:compile [INFO] | | \- javax.validation:validation-api:jar:1.1.0.Final:compile [INFO] | +- org.glassfish.jersey.containers:jersey-container-servlet:jar:2.22.2:compile [INFO] | +- org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.22.2:compile [INFO] | +- io.netty:netty-all:jar:4.0.43.Final:compile [INFO] | +- io.netty:netty:jar:3.9.9.Final:compile [INFO] | +- com.clearspring.analytics:stream:jar:2.7.0:compile [INFO] | +- io.dropwizard.metrics:metrics-core:jar:3.1.2:compile [INFO] | +- io.dropwizard.metrics:metrics-jvm:jar:3.1.2:compile [INFO] | +- io.dropwizard.metrics:metrics-json:jar:3.1.2:compile [INFO] | +- io.dropwizard.metrics:metrics-graphite:jar:3.1.2:compile [INFO] | +- com.fasterxml.jackson.core:jackson-databind:jar:2.6.5:compile [INFO] | +- com.fasterxml.jackson.module:jackson-module-scala_2.11:jar:2.6.5:compile [INFO] | | +- org.scala-lang:scala-reflect:jar:2.11.7:compile [INFO] | | \- com.fasterxml.jackson.module:jackson-module-paranamer:jar:2.6.5:compile [INFO] | +- org.apache.ivy:ivy:jar:2.4.0:compile [INFO] | +- oro:oro:jar:2.0.8:compile [INFO] | +- net.razorvine:pyrolite:jar:4.13:compile [INFO] | +- net.sf.py4j:py4j:jar:0.10.7:compile [INFO] | +- org.apache.spark:spark-tags_2.11:jar:2.2.0.cloudera3:compile [INFO] | +- org.apache.commons:commons-crypto:jar:1.0.0:compile [INFO] | \- org.spark-project.spark:unused:jar:1.0.0:compile [INFO] +- org.apache.spark:spark-sql_2.11:jar:2.2.0.cloudera3:compile [INFO] | +- com.univocity:univocity-parsers:jar:2.2.1:compile [INFO] | +- org.apache.spark:spark-sketch_2.11:jar:2.2.0.cloudera3:compile [INFO] | +- org.apache.spark:spark-catalyst_2.11:jar:2.2.0.cloudera3:compile [INFO] | | +- org.codehaus.janino:janino:jar:3.0.8:compile [INFO] | | +- org.codehaus.janino:commons-compiler:jar:3.0.8:compile [INFO] | | \- org.antlr:antlr4-runtime:jar:4.5.3:compile [INFO] | +- com.twitter:parquet-column:jar:1.5.0-cdh5.13.3:compile [INFO] | | +- com.twitter:parquet-common:jar:1.5.0-cdh5.13.3:compile [INFO] | | \- com.twitter:parquet-encoding:jar:1.5.0-cdh5.13.3:compile [INFO] | +- com.twitter:parquet-hadoop:jar:1.5.0-cdh5.13.3:compile [INFO] | | +- com.twitter:parquet-format:jar:2.1.0-cdh5.13.3:compile [INFO] | | \- com.twitter:parquet-jackson:jar:1.5.0-cdh5.13.3:compile [INFO] | \- com.twitter:parquet-avro:jar:1.5.0-cdh5.13.3:compile [INFO] | \- it.unimi.dsi:fastutil:jar:7.2.1:compile [INFO] +- org.apache.tika:tika-core:jar:1.19:compile [INFO] +- org.apache.tika:tika-parsers:jar:1.19:compile [INFO] | +- javax.xml.bind:jaxb-api:jar:2.3.0:compile [INFO] | +- com.sun.xml.bind:jaxb-core:jar:2.3.0:compile [INFO] | +- com.sun.xml.bind:jaxb-impl:jar:2.3.0:compile [INFO] | +- javax.activation:activation:jar:1.1.1:compile [INFO] | +- org.gagravarr:vorbis-java-tika:jar:0.8:compile [INFO] | +- com.healthmarketscience.jackcess:jackcess:jar:2.1.12:compile [INFO] | | \- commons-lang:commons-lang:jar:2.6:compile [INFO] | +- com.healthmarketscience.jackcess:jackcess-encrypt:jar:2.1.4:compile [INFO] | +- org.tallison:jmatio:jar:1.4:compile [INFO] | +- org.apache.james:apache-mime4j-core:jar:0.8.2:compile [INFO] | +- org.apache.james:apache-mime4j-dom:jar:0.8.2:compile [INFO] | +- org.tukaani:xz:jar:1.8:compile [INFO] | +- com.epam:parso:jar:2.0.9:compile [INFO] | +- org.brotli:dec:jar:0.1.2:compile [INFO] | +- commons-codec:commons-codec:jar:1.11:compile [INFO] | +- org.apache.pdfbox:pdfbox:jar:2.0.11:compile [INFO] | | \- org.apache.pdfbox:fontbox:jar:2.0.11:compile [INFO] | +- org.apache.pdfbox:pdfbox-tools:jar:2.0.11:compile [INFO] | +- org.apache.pdfbox:jempbox:jar:1.8.15:compile [INFO] | +- org.bouncycastle:bcmail-jdk15on:jar:1.60:compile [INFO] | | \- org.bouncycastle:bcpkix-jdk15on:jar:1.60:compile [INFO] | +- org.bouncycastle:bcprov-jdk15on:jar:1.60:compile [INFO] | +- org.apache.poi:poi:jar:4.0.0:compile [INFO] | | \- org.apache.commons:commons-collections4:jar:4.2:compile [INFO] | +- org.apache.poi:poi-scratchpad:jar:4.0.0:compile [INFO] | +- org.apache.poi:poi-ooxml:jar:4.0.0:compile [INFO] | | +- org.apache.poi:poi-ooxml-schemas:jar:4.0.0:compile [INFO] | | | \- org.apache.xmlbeans:xmlbeans:jar:3.0.1:compile [INFO] | | \- com.github.virtuald:curvesapi:jar:1.04:compile [INFO] | +- org.ccil.cowan.tagsoup:tagsoup:jar:1.2.1:compile [INFO] | +- org.ow2.asm:asm:jar:6.2:compile [INFO] | +- com.googlecode.mp4parser:isoparser:jar:1.1.22:compile [INFO] | +- com.drewnoakes:metadata-extractor:jar:2.11.0:compile [INFO] | | \- com.adobe.xmp:xmpcore:jar:5.1.3:compile [INFO] | +- de.l3s.boilerpipe:boilerpipe:jar:1.1.0:compile [INFO] | +- com.rometools:rome:jar:1.5.1:compile [INFO] | | \- com.rometools:rome-utils:jar:1.5.1:compile [INFO] | +- org.gagravarr:vorbis-java-core:jar:0.8:compile [INFO] | +- com.googlecode.juniversalchardet:juniversalchardet:jar:1.0.3:compile [INFO] | +- org.codelibs:jhighlight:jar:1.0.3:compile [INFO] | +- com.pff:java-libpst:jar:0.8.1:compile [INFO] | +- com.github.junrar:junrar:jar:2.0.0:compile [INFO] | +- org.apache.cxf:cxf-rt-rs-client:jar:3.2.6:compile [INFO] | | +- org.apache.cxf:cxf-rt-transports-http:jar:3.2.6:compile [INFO] | | +- org.apache.cxf:cxf-core:jar:3.2.6:compile [INFO] | | | +- com.fasterxml.woodstox:woodstox-core:jar:5.1.0:compile [INFO] | | | | \- org.codehaus.woodstox:stax2-api:jar:4.1:compile [INFO] | | | \- org.apache.ws.xmlschema:xmlschema-core:jar:2.2.3:compile [INFO] | | \- org.apache.cxf:cxf-rt-frontend-jaxrs:jar:3.2.6:compile [INFO] | +- org.apache.commons:commons-exec:jar:1.3:compile [INFO] | +- org.apache.opennlp:opennlp-tools:jar:1.9.0:compile [INFO] | +- commons-io:commons-io:jar:2.6:compile [INFO] | +- com.googlecode.json-simple:json-simple:jar:1.1.1:compile [INFO] | +- com.github.openjson:openjson:jar:1.0.10:compile [INFO] | +- com.google.code.gson:gson:jar:2.8.5:compile [INFO] | +- edu.ucar:netcdf4:jar:4.5.5:compile [INFO] | | \- net.jcip:jcip-annotations:jar:1.0:compile [INFO] | +- edu.ucar:grib:jar:4.5.5:compile [INFO] | | +- com.google.protobuf:protobuf-java:jar:2.5.0:compile [INFO] | | \- org.itadaki:bzip2:jar:0.9.1:compile [INFO] | +- net.java.dev.jna:jna:jar:4.3.0:compile [INFO] | +- org.jsoup:jsoup:jar:1.11.3:compile [INFO] | +- edu.ucar:cdm:jar:4.5.5:compile [INFO] | | +- edu.ucar:udunits:jar:4.5.5:compile [INFO] | | +- joda-time:joda-time:jar:2.2:compile [INFO] | | +- org.quartz-scheduler:quartz:jar:2.2.0:compile [INFO] | | | \- c3p0:c3p0:jar:0.9.1.1:compile [INFO] | | +- net.sf.ehcache:ehcache-core:jar:2.6.2:compile [INFO] | | \- com.beust:jcommander:jar:1.35:compile [INFO] | +- edu.ucar:httpservices:jar:4.5.5:compile [INFO] | +- org.apache.httpcomponents:httpclient:jar:4.5.6:compile [INFO] | +- org.apache.httpcomponents:httpmime:jar:4.5.6:compile [INFO] | +- org.apache.commons:commons-csv:jar:1.5:compile [INFO] | +- org.apache.sis.core:sis-utility:jar:0.8:compile [INFO] | | \- javax.measure:unit-api:jar:1.0:compile [INFO] | +- org.apache.sis.storage:sis-netcdf:jar:0.8:compile [INFO] | | +- org.apache.sis.storage:sis-storage:jar:0.8:compile [INFO] | | | \- org.apache.sis.core:sis-feature:jar:0.8:compile [INFO] | | \- org.apache.sis.core:sis-referencing:jar:0.8:compile [INFO] | +- org.apache.sis.core:sis-metadata:jar:0.8:compile [INFO] | +- org.opengis:geoapi:jar:3.0.1:compile [INFO] | +- edu.usc.ir:sentiment-analysis-parser:jar:0.1:compile [INFO] | +- org.apache.uima:uimafit-core:jar:2.2.0:compile [INFO] | +- org.apache.uima:uimaj-core:jar:2.9.0:compile [INFO] | +- org.jdom:jdom2:jar:2.0.6:compile [INFO] | +- com.fasterxml.jackson.core:jackson-core:jar:2.9.6:compile [INFO] | +- com.fasterxml.jackson.core:jackson-annotations:jar:2.9.6:compile [INFO] | +- org.apache.pdfbox:jbig2-imageio:jar:3.0.1:compile [INFO] | \- com.github.jai-imageio:jai-imageio-core:jar:1.4.0:compile [INFO] +- javax.ws.rs:javax.ws.rs-api:jar:2.1.1:compile [INFO] \- org.scala-lang:scala-library:jar:2.11.12:compile

 

 

 

 

 

 

 

 

 

Explorer
Posts: 13
Registered: ‎09-30-2018

Re: Spark 2.x + Tika: java.lang.NoSuchMethodError: org.apache.commons.compress.archivers.ArchiveStr

[ Edited ]

if it occurred when you do the spark-submit, I think you cuold add "--jars  the jar's absolute path" just like "--jars  /a/b/c.jar" in your shell, then try to submit it again.

New Contributor
Posts: 3
Registered: ‎11-17-2016

Re: Spark 2.x + Tika: java.lang.NoSuchMethodError: org.apache.commons.compress.archivers.ArchiveStr

Hi,

 

Thx for the reply.  Unfortunately that is one of the things I've tried (see original post).  I've tried --jars with both HDFS and local locations.  I think this is a bigger issue from my testing -- I suspect it's a classpath issue that is getting overruled by Spark's required classes / libraries.  In my original post, I mention trying the --conf flags and posted a portion of the resulting errors -- essentially, I'm using an uber-jar, passing that uber-jar that has been verified that the the dependencies in question (commons-compress) are up to date within my uber-jar.  The "nosuchmethod error" occurs, so I try the --conf flags.  Doing so causes Spark to crash on startup (see error at top of original post) since I'm overruling Spark's classpath for its required methods.

 

I should've named this to something like -- how to resolve commons-compress library dependency for Spark.  

 

I think this snippet from another post is telling:

 

Explorer
Posts: 13
Registered: ‎09-30-2018

Re: Spark 2.x + Tika: java.lang.NoSuchMethodError: org.apache.commons.compress.archivers.ArchiveStr

if you use CDH, could check the spark version on the WebUI, as my side, I think it should be the version not match issue

Announcements