About KarthikKambatla

KarthikKambatla · ‎05-20-2015

Can you browse /user/history/done_intermediate on HDFS to see if the files are copied? The files are copied there by the MapReduce AM and later read by JHS. If the files aren't present there, the MapReduce AM is likely not copying them. Can we make sure mapreduce.jobhistory.intermediate-done-dir is set to the same value in the AM and JHS? If not set, it uses yarn.app.mapreduce.am.staging-dir. We should check if that value is the same between the AM and JHS.

KarthikKambatla · ‎05-20-2015

What user are the MR and Spark containers run as? The Yarn processes all run as user "yarn", and I wonder if a user-name mismatch is causing issues. Also, what user are you running the profiling tools?

KarthikKambatla · ‎05-20-2015

On the worker nodes, the number of cores determine the number of Yarn containers (MapReduce or Spark) that can run on that node. One could consider the amount of memory on the node and the number of disks to pick the number of cores. I haven't looked at the latest recommendations, but I believe 2 cores per disk is reasonable. Memory to cores ratio choice should depend on the workload itself - the average container size.

Online	Offline
Last Visited	‎08-28-2016 12:31 PM

Member Since	‎07-08-2013 11:15 AM
Last Visited	‎08-28-2016 12:31 PM
Posts	8

Cloudera Community

Re: Java Profiling of long running MapReduce conta...

Re: [YARN] JHS M/R Job not Found (Not Found: job_x...

Re: Java Profiling of long running MapReduce conta...

Re: CPU Configuration (cores/speed) for Master and...