Support Questions

ashneesharma88 · ‎07-13-2016

I have 3 node cluster and trying to run the command.

I am running follwoing command to run class file

java -cp .:spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar:spark-csv_2.10-1.4.0.jar:commons-csv-1.1.jar SparkMainV4 "spark://xyz.abc.com:7077" "WD" "spark.executor.memory;6g,spark.shuffle.consolidateFile;false,spark.driver.memory;5g,spark.akka.frameSize;2047,spark.locality.wait;600,spark.network.timeout;600,spark.sql.shuffle.partitions;500"

but getting error :-

ERROR TaskSchedulerImpl: Lost executor 1 on xyz.abc.com: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: ResultStage 67 (saveAsTextFile at package.scala:179) has failed the maximum allowable number of times: 4. Most recent failure reason: org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 36 at org.apache.spark.MapOutputTracker$anonfun$org$apache$spark$MapOutputTracker$convertMapStatuses$2.apply(MapOutputTracker.scala:542) at org.apache.spark.MapOutputTracker$anonfun$org$apache$spark$MapOutputTracker$convertMapStatuses$2.apply(MapOutputTracker.scala:538) at scala.collection.TraversableLike$WithFilter$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.MapOutputTracker$.org$apache$spark$MapOutputTracker$convertMapStatuses(MapOutputTracker.scala:538)

nyadav · ‎07-13-2016

It is happening during pause on long-running jobs on a large data set. As per the logs, during a shuffle step an executor fails and doesn't report its output, and during the reduce step, that output can't be found where expected and rather than rerunning the failed execution, Spark goes down. Try to reduce parallelism to executors x cores.

ashneesharma88 · ‎07-13-2016

We are not using parallelism, could you please help from where I can reduce the cores.

And yesterday this same code was working fine.

nyadav · ‎07-13-2016

does your spark job failed ?

These messages can be because if spark dynamic allocation, possibly release of executor.

Maybe resource are not free on the YARN, containers timeout

Any other error message in the log?

ashneesharma88 · ‎07-13-2016

Yes spark job is failed. We are trying to coalesce the file. But getting the error.

ashneesharma88 · ‎07-22-2016

There is no error like timout but, I incresed the ram to 64 GB and it's works.

Cloudera Community

Support Questions

Spark memory issue