Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

lastest HDP 2.6.5.0-292 DataFrame show() throws an error

avatar
Frequent Visitor

Hi ,

I'm using latest HDP ,version is 2.6.5.0-292. spark version is 2.3.0

when I'm trying to run show() from any DataFrame ,it always throw error :

scala> spark.read.csv("/user/a.txt").show()

java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V at org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122) at org.apache.spark.sql.execution.SparkPlan.org$apache$spark$sql$execution$SparkPlan$decodeUnsafeRows(SparkPlan.scala:274) at org.apache.spark.sql.execution.SparkPlan$anonfun$executeTake$1.apply(SparkPlan.scala:366) at org.apache.spark.sql.execution.SparkPlan$anonfun$executeTake$1.apply(SparkPlan.scala:366) at scala.collection.TraversableLike$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:186) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:366) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$collectFromPlan(Dataset.scala:3272) at org.apache.spark.sql.Dataset$anonfun$head$1.apply(Dataset.scala:2484) at org.apache.spark.sql.Dataset$anonfun$head$1.apply(Dataset.scala:2484) at org.apache.spark.sql.Dataset$anonfun$52.apply(Dataset.scala:3253) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3252) at org.apache.spark.sql.Dataset.head(Dataset.scala:2484) at org.apache.spark.sql.Dataset.take(Dataset.scala:2698) at org.apache.spark.sql.execution.datasources.csv.TextInputCSVDataSource$.infer(CSVDataSource.scala:148) at org.apache.spark.sql.execution.datasources.csv.CSVDataSource.inferSchema(CSVDataSource.scala:63) at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:57) at org.apache.spark.sql.execution.datasources.DataSource$anonfun$8.apply(DataSource.scala:202) at org.apache.spark.sql.execution.datasources.DataSource$anonfun$8.apply(DataSource.scala:202) at scala.Option.orElse(Option.scala:289) at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:201) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:392) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227) at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:596) at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:473)


I've tried both pyspark and spark-shell on 3 sets of newly installed hdp 2.6.5.0-292.

the DataFrame writing function works well ,only show() throws the error.

are there anyone encountered same issue as I had? how to fix this problem?

1 ACCEPTED SOLUTION

avatar

@dalin qin this type of errors are due multiple versions of same jar in classpath. Could you run

lsof -P -p <pid> | grep lz4

this will hopefully show the places from where the lz4 jar is being used and probably the incorrect version is being picked. Note: pid is the spark shell pid

HTH

View solution in original post

2 REPLIES 2

avatar

@dalin qin this type of errors are due multiple versions of same jar in classpath. Could you run

lsof -P -p <pid> | grep lz4

this will hopefully show the places from where the lz4 jar is being used and probably the incorrect version is being picked. Note: pid is the spark shell pid

HTH

avatar
Frequent Visitor

thank you very much ,that' my bad ,I had added some other jars in my class path leading to this error.