Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

I get an IllegalArgumentException error when trying to read a file with Spark 1.4.1

avatar

Running a Spark command to read a file and get an illegalArgumentException. This is HDP 2.3.1 and Spark 1.4.1. Same error occurs with PySpark. The error appears to come from the SnappyCompressionCodec.

scala> var file = sc.textFile("hdfs://HdpTest:8020/user/weli/README.md")

java.lang.IllegalArgumentException

at org.apache.spark.io.SnappyCompressionCodec.<init>(CompressionCodec.scala:152)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

1 ACCEPTED SOLUTION

avatar
Master Mentor

scala> var file = sc.textFile("hdfs://nsfed01.cloud.hortonworks.com:8020/tmp/expense.csv")

15/11/08 17:34:06 INFO MemoryStore: ensureFreeSpace(200320) called with curMem=0, maxMem=278019440

15/11/08 17:34:06 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 195.6 KB, free 264.9 MB)

15/11/08 17:34:06 INFO MemoryStore: ensureFreeSpace(18855) called with curMem=200320, maxMem=278019440

15/11/08 17:34:06 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 18.4 KB, free 264.9 MB)

15/11/08 17:34:06 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:40023 (size: 18.4 KB, free: 265.1 MB)

15/11/08 17:34:06 INFO SparkContext: Created broadcast 0 from textFile at <console>:15

file: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at <console>:15

scala>

[root@nsfed01 ~]# rpm -qa | grep -i snappy

snappy-1.1.0-1.el6.x86_64

snappy-devel-1.1.0-1.el6.x86_64

The above is from HDP 2.3.2

Not sure if its related https://issues.apache.org/jira/browse/SPARK-8946

@Scott Shaw

View solution in original post

3 REPLIES 3

avatar
Master Mentor

scala> var file = sc.textFile("hdfs://nsfed01.cloud.hortonworks.com:8020/tmp/expense.csv")

15/11/08 17:34:06 INFO MemoryStore: ensureFreeSpace(200320) called with curMem=0, maxMem=278019440

15/11/08 17:34:06 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 195.6 KB, free 264.9 MB)

15/11/08 17:34:06 INFO MemoryStore: ensureFreeSpace(18855) called with curMem=200320, maxMem=278019440

15/11/08 17:34:06 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 18.4 KB, free 264.9 MB)

15/11/08 17:34:06 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:40023 (size: 18.4 KB, free: 265.1 MB)

15/11/08 17:34:06 INFO SparkContext: Created broadcast 0 from textFile at <console>:15

file: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at <console>:15

scala>

[root@nsfed01 ~]# rpm -qa | grep -i snappy

snappy-1.1.0-1.el6.x86_64

snappy-devel-1.1.0-1.el6.x86_64

The above is from HDP 2.3.2

Not sure if its related https://issues.apache.org/jira/browse/SPARK-8946

@Scott Shaw

avatar
Rising Star

HDP-2.3.2.0-2950

Spark: 1.4.1.2.3

$ rpm -qa |grep -i snappy

snappy-devel-1.1.0-3.el7.x86_64 snappy-1.1.0-3.el7.x86_64

Hortonwork support suggests us to apply spark 1.5.1 TP repo. But our Unix admin needs a tar.gz file to set up a local repo. Anyone knows the link to tar file?

avatar
Rising Star

After adding org.xerial.snappy.tempdir to a newly created directory with rwx permissions, spark works fine now.