Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

I get an IllegalArgumentException error when trying to read a file with Spark 1.4.1

avatar

Running a Spark command to read a file and get an illegalArgumentException. This is HDP 2.3.1 and Spark 1.4.1. Same error occurs with PySpark. The error appears to come from the SnappyCompressionCodec.

scala> var file = sc.textFile("hdfs://HdpTest:8020/user/weli/README.md")

java.lang.IllegalArgumentException

at org.apache.spark.io.SnappyCompressionCodec.<init>(CompressionCodec.scala:152)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

1 ACCEPTED SOLUTION

avatar
Master Mentor

scala> var file = sc.textFile("hdfs://nsfed01.cloud.hortonworks.com:8020/tmp/expense.csv")

15/11/08 17:34:06 INFO MemoryStore: ensureFreeSpace(200320) called with curMem=0, maxMem=278019440

15/11/08 17:34:06 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 195.6 KB, free 264.9 MB)

15/11/08 17:34:06 INFO MemoryStore: ensureFreeSpace(18855) called with curMem=200320, maxMem=278019440

15/11/08 17:34:06 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 18.4 KB, free 264.9 MB)

15/11/08 17:34:06 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:40023 (size: 18.4 KB, free: 265.1 MB)

15/11/08 17:34:06 INFO SparkContext: Created broadcast 0 from textFile at <console>:15

file: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at <console>:15

scala>

[root@nsfed01 ~]# rpm -qa | grep -i snappy

snappy-1.1.0-1.el6.x86_64

snappy-devel-1.1.0-1.el6.x86_64

The above is from HDP 2.3.2

Not sure if its related https://issues.apache.org/jira/browse/SPARK-8946

@Scott Shaw

View solution in original post

3 REPLIES 3

avatar
Master Mentor

scala> var file = sc.textFile("hdfs://nsfed01.cloud.hortonworks.com:8020/tmp/expense.csv")

15/11/08 17:34:06 INFO MemoryStore: ensureFreeSpace(200320) called with curMem=0, maxMem=278019440

15/11/08 17:34:06 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 195.6 KB, free 264.9 MB)

15/11/08 17:34:06 INFO MemoryStore: ensureFreeSpace(18855) called with curMem=200320, maxMem=278019440

15/11/08 17:34:06 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 18.4 KB, free 264.9 MB)

15/11/08 17:34:06 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:40023 (size: 18.4 KB, free: 265.1 MB)

15/11/08 17:34:06 INFO SparkContext: Created broadcast 0 from textFile at <console>:15

file: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at <console>:15

scala>

[root@nsfed01 ~]# rpm -qa | grep -i snappy

snappy-1.1.0-1.el6.x86_64

snappy-devel-1.1.0-1.el6.x86_64

The above is from HDP 2.3.2

Not sure if its related https://issues.apache.org/jira/browse/SPARK-8946

@Scott Shaw

avatar
Rising Star

HDP-2.3.2.0-2950

Spark: 1.4.1.2.3

$ rpm -qa |grep -i snappy

snappy-devel-1.1.0-3.el7.x86_64 snappy-1.1.0-3.el7.x86_64

Hortonwork support suggests us to apply spark 1.5.1 TP repo. But our Unix admin needs a tar.gz file to set up a local repo. Anyone knows the link to tar file?

avatar
Rising Star

After adding org.xerial.snappy.tempdir to a newly created directory with rwx permissions, spark works fine now.