Member since
11-19-2014
4
Posts
0
Kudos Received
0
Solutions
12-17-2014
09:10 AM
joliveirinha, still have not resolved issues. Though, if you download spark 1.1.1. and install, that seems to work.
... View more
12-02-2014
01:35 PM
i guess no one is having this same issue as I? We created a new gateway server for spark and was still getting the issue. However, when i run it on a worker node, it seems to work. But when I run a quick python script: > cat test.py from pyspark import SparkConf, SparkContext sc = SparkContext() print sc.textFile('/tmp/test.txt').count() and I still get unread blocks issue. I'm @ a loss as to why this happening. Nothing is changed w/ all the nodes. This is all done through Cloudera Manager parcels. OS on all the nodes are identical.
... View more
11-20-2014
08:11 AM
In this sitauation, not runnign any packages. This is being done through spark shell. It was a quick test of loading a file and saving it to a directory. In this case I enabled the spark standalone role through cloudera manager. That was pretty much it. This was all installed through cloudera manager. From there I logged on to a gateway box as well as a worker node and the master node. I went into spark-shell. Loaded a blank file. And then save file to an output folder. I thought it might've been file so I put in some data. I put file into hdfs tmp. And was still getting error when trying to save output through sparkshell. Essentially after enabling spark standalone role through ClouderaManager on 4 servers (1 master, 2 workers, 1 gateway), this is what i did: $> vi /temp/test.txt contents of test.txt: 1,abc,987,zyx 2,efg,654,wvu $> sudo -u hdfs hadoop fs -put /temp/test.txt /tmp/ on gateway node $>spark-shell --master spark://cloudera-1.testdomain.net:7077 scala> val source = sc.textFile("/tmp//test.txt") scala> source.saveAsTextFile("/tmp/zzz_testsparkoutput") and then i get the errors. Here's my spark-env.sh: #!/usr/bin/env bash ## # Generated by Cloudera Manager and should not be modified directly ## export SPARK_HOME=/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/spark export STANDALONE_SPARK_MASTER_HOST=cloudera-1.testdomain.net export SPARK_MASTER_PORT=7077 export DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop ### Path of Spark assembly jar in HDFS export SPARK_JAR_HDFS_PATH=${SPARK_JAR_HDFS_PATH:-/user/spark/share/lib/spark-assembly.jar} ### Let's run everything with JVM runtime, instead of Scala export SPARK_LAUNCH_WITH_SCALA=0 export SPARK_LIBRARY_PATH=${SPARK_HOME}/lib export SCALA_LIBRARY_PATH=${SPARK_HOME}/lib export SPARK_MASTER_IP=$STANDALONE_SPARK_MASTER_HOST export HADOOP_HOME=${HADOOP_HOME:-$DEFAULT_HADOOP_HOME} if [ -n "$HADOOP_HOME" ]; then export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:${HADOOP_HOME}/lib/native fi export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf} and this is my spark-defaults.conf: spark.eventLog.dir=hdfs://cloudera-2.testdomain.net:8020/user/spark/applicationHistory spark.eventLog.enabled=true spark.master=spark://cloudera-1.testdomain.net:7077 @ a loss as to why this is happening.
... View more
11-19-2014
02:52 PM
I'm essentially loading a file and saving output to another location: val source = sc.textFile("/tmp/testfile.txt") source.saveAsTextFile("/tmp/testsparkoutput") when i do so, i'm hitting this error: 14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile at <console>:15 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, cloudera-1.testdomain.net😞 java.lang.IllegalStateException: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:162) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Cant figure out what the issue is. I'm running in CDH5.2 w/ version of spark being 1.1. The file i'm loading is literally just 7 MB. I thought it was jar files mismatch, but i did a compare and see they're all identical. But seeing as how they were all installed through CDH parcels, not sure how there would be version mismatch on the nodes and master. Oh yeah 1 master node w/ 2 worker nodes and running in standalone not through yarn. So as a just in case, i copied the jars from the master to the 2 worker nodes as just in case, and still same issue. Weird thing is, first time i installed and tested it out, it worked, but now it doesn't. Spark role and such was installed through Cloudera manager Any help here would be greatly appreciated.
... View more
Labels: