Created on 11-19-2014 02:52 PM - edited 09-16-2022 02:13 AM
Created 11-20-2014 01:31 AM
I would not mess with the installed JARs. In fact it's possible something has been changed inadvertently in this process.
My guess is that you are packaging Spark with your app? and it is not the same version. You do not package Spark code with a Spark app as it is provided.
Created 11-20-2014 08:11 AM
In this sitauation, not runnign any packages. This is being done through spark shell. It was a quick test of loading a file and saving it to a directory. In this case I enabled the spark standalone role through cloudera manager. That was pretty much it. This was all installed through cloudera manager. From there I logged on to a gateway box as well as a worker node and the master node. I went into spark-shell. Loaded a blank file. And then save file to an output folder. I thought it might've been file so I put in some data. I put file into hdfs tmp. And was still getting error when trying to save output through sparkshell.
Essentially after enabling spark standalone role through ClouderaManager on 4 servers (1 master, 2 workers, 1 gateway), this is what i did:
$> vi /temp/test.txt
contents of test.txt:
1,abc,987,zyx
2,efg,654,wvu
$> sudo -u hdfs hadoop fs -put /temp/test.txt /tmp/
on gateway node
$>spark-shell --master spark://cloudera-1.testdomain.net:7077
scala> val source = sc.textFile("/tmp//test.txt")
scala> source.saveAsTextFile("/tmp/zzz_testsparkoutput")
and then i get the errors. Here's my spark-env.sh:
#!/usr/bin/env bash
##
# Generated by Cloudera Manager and should not be modified directly
##
export SPARK_HOME=/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/spark
export STANDALONE_SPARK_MASTER_HOST=cloudera-1.testdomain.net
export SPARK_MASTER_PORT=7077
export DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop
### Path of Spark assembly jar in HDFS
export SPARK_JAR_HDFS_PATH=${SPARK_JAR_HDFS_PATH:-/user/spark/share/lib/spark-assembly.jar}
### Let's run everything with JVM runtime, instead of Scala
export SPARK_LAUNCH_WITH_SCALA=0
export SPARK_LIBRARY_PATH=${SPARK_HOME}/lib
export SCALA_LIBRARY_PATH=${SPARK_HOME}/lib
export SPARK_MASTER_IP=$STANDALONE_SPARK_MASTER_HOST
export HADOOP_HOME=${HADOOP_HOME:-$DEFAULT_HADOOP_HOME}
if [ -n "$HADOOP_HOME" ]; then
export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:${HADOOP_HOME}/lib/native
fi
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf}
and this is my spark-defaults.conf:
spark.eventLog.dir=hdfs://cloudera-2.testdomain.net:8020/user/spark/applicationHistory
spark.eventLog.enabled=true
spark.master=spark://cloudera-1.testdomain.net:7077
@ a loss as to why this is happening.
Created 12-02-2014 01:35 PM
i guess no one is having this same issue as I?
We created a new gateway server for spark and was still getting the issue. However, when i run it on a worker node, it seems to work. But when I run a quick python script:
> cat test.py
from pyspark import SparkConf, SparkContext
sc = SparkContext()
print sc.textFile('/tmp/test.txt').count()
and I still get unread blocks issue. I'm @ a loss as to why this happening. Nothing is changed w/ all the nodes. This is all done through Cloudera Manager parcels. OS on all the nodes are identical.
Created 12-17-2014 08:42 AM
Hi ansonabraham,
I am having this issue also. It only happens when I try to read/write files from hdfs.
Have you found a solution?
Created 12-17-2014 09:10 AM
joliveirinha, still have not resolved issues. Though, if you download spark 1.1.1. and install, that seems to work.