Member since
07-20-2014
39
Posts
4
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2699 | 08-10-2015 05:07 PM | |
3669 | 02-25-2015 03:10 AM |
10-11-2017
03:49 PM
Hi Rabk, What the log4j WARN message provided shows is a task thats failing with a FetchFailedException because a shuffle file (shuffle_0_2_0.index) can't be found, it does not show what the job fails with or what transpires during the job run. But lets assume that the job fails when a stage has failed 4 times because of the fetch failures. One possible cause for the FetchFailedException is that you are running out of space on Nodemanager local-dirs (where shuffle files are stored), so look at the Nodemanager logs (on datanode 2), from the timeframe when the job ran, for bad disk/local-dirs messages. When this happens, YARN will send a SIGTERM to the containers/executors, like you have observed. Are you able to run the query on just a fraction of the current data or does it succeed if no other jobs are running on the cluster?
... View more
08-10-2015
05:07 PM
Hi,
As described in the sort based shuffle design doc (https://issues.apache.org/jira/secure/attachment/12655884/Sort-basedshuffledesign.pdf), each map task should generate 1 shuffle data file 1 index file.
Regarding your second question, the property to specify the buffer for shuffle data is "spark.shuffle.memoryFraction". This is discussed in more detail in the following Cloudera blog:
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
Regards,
Bjorn
... View more
02-25-2015
03:10 AM
Hi, The stack trace reported here is identical to MAPREDUCE-5799. Its a classpath issue that can be resolved by adding the following property to your client configurations: <property> <name>yarn.app.mapreduce.am.env</name> <value>LD_LIBRARY_PATH=/opt/cloudera/parcels/CDH/lib/hadoop/lib/native</value> </property>
... View more