Created 07-07-2018 04:27 AM
Hi ,
I'm using latest HDP ,version is 2.6.5.0-292. spark version is 2.3.0
when I'm trying to run show() from any DataFrame ,it always throw error :
scala> spark.read.csv("/user/a.txt").show()
java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V at org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122) at org.apache.spark.sql.execution.SparkPlan.org$apache$spark$sql$execution$SparkPlan$decodeUnsafeRows(SparkPlan.scala:274) at org.apache.spark.sql.execution.SparkPlan$anonfun$executeTake$1.apply(SparkPlan.scala:366) at org.apache.spark.sql.execution.SparkPlan$anonfun$executeTake$1.apply(SparkPlan.scala:366) at scala.collection.TraversableLike$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:186) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:366) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$collectFromPlan(Dataset.scala:3272) at org.apache.spark.sql.Dataset$anonfun$head$1.apply(Dataset.scala:2484) at org.apache.spark.sql.Dataset$anonfun$head$1.apply(Dataset.scala:2484) at org.apache.spark.sql.Dataset$anonfun$52.apply(Dataset.scala:3253) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3252) at org.apache.spark.sql.Dataset.head(Dataset.scala:2484) at org.apache.spark.sql.Dataset.take(Dataset.scala:2698) at org.apache.spark.sql.execution.datasources.csv.TextInputCSVDataSource$.infer(CSVDataSource.scala:148) at org.apache.spark.sql.execution.datasources.csv.CSVDataSource.inferSchema(CSVDataSource.scala:63) at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:57) at org.apache.spark.sql.execution.datasources.DataSource$anonfun$8.apply(DataSource.scala:202) at org.apache.spark.sql.execution.datasources.DataSource$anonfun$8.apply(DataSource.scala:202) at scala.Option.orElse(Option.scala:289) at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:201) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:392) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227) at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:596) at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:473)
I've tried both pyspark and spark-shell on 3 sets of newly installed hdp 2.6.5.0-292.
the DataFrame writing function works well ,only show() throws the error.
are there anyone encountered same issue as I had? how to fix this problem?
Created 07-07-2018 01:39 PM
@dalin qin this type of errors are due multiple versions of same jar in classpath. Could you run
lsof -P -p <pid> | grep lz4
this will hopefully show the places from where the lz4 jar is being used and probably the incorrect version is being picked. Note: pid is the spark shell pid
HTH
Created 07-07-2018 01:39 PM
@dalin qin this type of errors are due multiple versions of same jar in classpath. Could you run
lsof -P -p <pid> | grep lz4
this will hopefully show the places from where the lz4 jar is being used and probably the incorrect version is being picked. Note: pid is the spark shell pid
HTH
Created 07-07-2018 02:16 PM
thank you very much ,that' my bad ,I had added some other jars in my class path leading to this error.