Created 11-09-2015 01:21 AM
Running a Spark command to read a file and get an illegalArgumentException. This is HDP 2.3.1 and Spark 1.4.1. Same error occurs with PySpark. The error appears to come from the SnappyCompressionCodec.
scala> var file = sc.textFile("hdfs://HdpTest:8020/user/weli/README.md")
java.lang.IllegalArgumentException
at org.apache.spark.io.SnappyCompressionCodec.<init>(CompressionCodec.scala:152)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Created 11-09-2015 01:39 AM
scala> var file = sc.textFile("hdfs://nsfed01.cloud.hortonworks.com:8020/tmp/expense.csv")
15/11/08 17:34:06 INFO MemoryStore: ensureFreeSpace(200320) called with curMem=0, maxMem=278019440
15/11/08 17:34:06 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 195.6 KB, free 264.9 MB)
15/11/08 17:34:06 INFO MemoryStore: ensureFreeSpace(18855) called with curMem=200320, maxMem=278019440
15/11/08 17:34:06 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 18.4 KB, free 264.9 MB)
15/11/08 17:34:06 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:40023 (size: 18.4 KB, free: 265.1 MB)
15/11/08 17:34:06 INFO SparkContext: Created broadcast 0 from textFile at <console>:15
file: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at <console>:15
scala>
[root@nsfed01 ~]# rpm -qa | grep -i snappy
snappy-1.1.0-1.el6.x86_64
snappy-devel-1.1.0-1.el6.x86_64
The above is from HDP 2.3.2
Not sure if its related https://issues.apache.org/jira/browse/SPARK-8946
Created 11-09-2015 01:39 AM
scala> var file = sc.textFile("hdfs://nsfed01.cloud.hortonworks.com:8020/tmp/expense.csv")
15/11/08 17:34:06 INFO MemoryStore: ensureFreeSpace(200320) called with curMem=0, maxMem=278019440
15/11/08 17:34:06 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 195.6 KB, free 264.9 MB)
15/11/08 17:34:06 INFO MemoryStore: ensureFreeSpace(18855) called with curMem=200320, maxMem=278019440
15/11/08 17:34:06 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 18.4 KB, free 264.9 MB)
15/11/08 17:34:06 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:40023 (size: 18.4 KB, free: 265.1 MB)
15/11/08 17:34:06 INFO SparkContext: Created broadcast 0 from textFile at <console>:15
file: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at <console>:15
scala>
[root@nsfed01 ~]# rpm -qa | grep -i snappy
snappy-1.1.0-1.el6.x86_64
snappy-devel-1.1.0-1.el6.x86_64
The above is from HDP 2.3.2
Not sure if its related https://issues.apache.org/jira/browse/SPARK-8946
Created 12-14-2015 10:26 PM
HDP-2.3.2.0-2950
Spark: 1.4.1.2.3
$ rpm -qa |grep -i snappy
snappy-devel-1.1.0-3.el7.x86_64 snappy-1.1.0-3.el7.x86_64
Hortonwork support suggests us to apply spark 1.5.1 TP repo. But our Unix admin needs a tar.gz file to set up a local repo. Anyone knows the link to tar file?
Created 01-03-2016 02:53 AM
After adding org.xerial.snappy.tempdir to a newly created directory with rwx permissions, spark works fine now.