Archives of Support Questions (Read Only)

dblive · ‎07-07-2018

Hi ,

I'm using latest HDP ,version is 2.6.5.0-292. spark version is 2.3.0

when I'm trying to run show() from any DataFrame ,it always throw error :

scala> spark.read.csv("/user/a.txt").show()

java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V at org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122) at org.apache.spark.sql.execution.SparkPlan.org$apache$spark$sql$execution$SparkPlan$decodeUnsafeRows(SparkPlan.scala:274) at org.apache.spark.sql.execution.SparkPlan$anonfun$executeTake$1.apply(SparkPlan.scala:366) at org.apache.spark.sql.execution.SparkPlan$anonfun$executeTake$1.apply(SparkPlan.scala:366) at scala.collection.TraversableLike$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:186) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:366) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$collectFromPlan(Dataset.scala:3272) at org.apache.spark.sql.Dataset$anonfun$head$1.apply(Dataset.scala:2484) at org.apache.spark.sql.Dataset$anonfun$head$1.apply(Dataset.scala:2484) at org.apache.spark.sql.Dataset$anonfun$52.apply(Dataset.scala:3253) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3252) at org.apache.spark.sql.Dataset.head(Dataset.scala:2484) at org.apache.spark.sql.Dataset.take(Dataset.scala:2698) at org.apache.spark.sql.execution.datasources.csv.TextInputCSVDataSource$.infer(CSVDataSource.scala:148) at org.apache.spark.sql.execution.datasources.csv.CSVDataSource.inferSchema(CSVDataSource.scala:63) at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:57) at org.apache.spark.sql.execution.datasources.DataSource$anonfun$8.apply(DataSource.scala:202) at org.apache.spark.sql.execution.datasources.DataSource$anonfun$8.apply(DataSource.scala:202) at scala.Option.orElse(Option.scala:289) at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:201) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:392) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227) at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:596) at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:473)

I've tried both pyspark and spark-shell on 3 sets of newly installed hdp 2.6.5.0-292.

the DataFrame writing function works well ,only show() throws the error.

are there anyone encountered same issue as I had? how to fix this problem?

falbani · ‎07-07-2018

@dalin qin this type of errors are due multiple versions of same jar in classpath. Could you run

lsof -P -p <pid> | grep lz4

this will hopefully show the places from where the lz4 jar is being used and probably the incorrect version is being picked. Note: pid is the spark shell pid

HTH

View solution in original post

falbani · ‎07-07-2018

@dalin qin this type of errors are due multiple versions of same jar in classpath. Could you run

lsof -P -p <pid> | grep lz4

this will hopefully show the places from where the lz4 jar is being used and probably the incorrect version is being picked. Note: pid is the spark shell pid

HTH

dblive · ‎07-07-2018

thank you very much ,that' my bad ,I had added some other jars in my class path leading to this error.

Cloudera Community

Archives of Support Questions (Read Only)

lastest HDP 2.6.5.0-292 DataFrame show() throws an error