Member since
07-08-2013
8
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2058 | 05-20-2015 10:54 AM |
05-20-2015
11:11 AM
Can you browse /user/history/done_intermediate on HDFS to see if the files are copied? The files are copied there by the MapReduce AM and later read by JHS. If the files aren't present there, the MapReduce AM is likely not copying them. Can we make sure mapreduce.jobhistory.intermediate-done-dir is set to the same value in the AM and JHS? If not set, it uses yarn.app.mapreduce.am.staging-dir. We should check if that value is the same between the AM and JHS.
... View more
05-20-2015
10:54 AM
What user are the MR and Spark containers run as? The Yarn processes all run as user "yarn", and I wonder if a user-name mismatch is causing issues. Also, what user are you running the profiling tools?
... View more
05-20-2015
10:42 AM
On the worker nodes, the number of cores determine the number of Yarn containers (MapReduce or Spark) that can run on that node. One could consider the amount of memory on the node and the number of disks to pick the number of cores. I haven't looked at the latest recommendations, but I believe 2 cores per disk is reasonable. Memory to cores ratio choice should depend on the workload itself - the average container size.
... View more