@DeepikaPant, I wrote up a more detailed analysis of the issue and a workaround here: https://github.com/archivesunleashed/aut/issues/308. The solution is to include a JAR containing an appropriate version of commons-compress with the --driver-class-path argument to spark-shell or spark-submit.
... View more
I found a solution, but I don't understand why it works. In our project we were previously using Tika 1.12; I encountered the NoSuchMethodError when we upgraded to Tika 1.19.1. When I compared the dependency trees for builds with these two versions of Tika to see how commons-compress was being included, the only structural difference I found was that the new version of Tika introduced a transitive dependency on org.apache.poi.ooxml: [INFO] | +- org.apache.poi:poi-ooxml:jar:4.0.0:compile
[INFO] | | +- (org.apache.poi:poi:jar:4.0.0:compile - omitted for duplicate)
[INFO] | | +- org.apache.poi:poi-ooxml-schemas:jar:4.0.0:compile
[INFO] | | | \- org.apache.xmlbeans:xmlbeans:jar:3.0.1:compile
[INFO] | | +- (org.apache.commons:commons-compress:jar:1.18:compile - omitted for conflict with 1.4.1)
[INFO] | | \- com.github.virtuald:curvesapi:jar:1.04:compile (Our pom.xml specifies the dependency on commons-compress 1.18; Hadoop 2.6.5 libraries have the dependency on commons-compress 1.4.1) I don't see why poi-ooxml's dependency on commons-compress would prevent the inclusion of commons-compress 1.18, but that is what seems to be happening. When I exclude poi-ooxml from tika-parsers, calls to the Tika parser in spark-shell work as expected.
... View more