I have a single box with CDH 5.3.4 installed and I'm trying to run a test mapreduce job to confirm that things are setup correctly.
$ sudo -u hdfs hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.4.jar pi 2 2 Number of Maps = 2 Samples per Map = 2 Wrote input for Map #0 Wrote input for Map #1 Starting Job 16/11/16 21:56:08 INFO client.RMProxy: Connecting to ResourceManager at qa01-ost-wakefield-c-hb01.td.local/192.168.104.216:8032 16/11/16 21:56:09 INFO input.FileInputFormat: Total input paths to process : 2 16/11/16 21:56:09 INFO mapreduce.JobSubmitter: number of splits:2 16/11/16 21:56:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1479336097366_0002 16/11/16 21:56:09 INFO impl.YarnClientImpl: Submitted application application_1479336097366_0002 16/11/16 21:56:09 INFO mapreduce.Job: The url to track the job: http://qa01-ost-wakefield-c-hb01.td.local:8088/proxy/application_1479336097366_0002/ 16/11/16 21:56:09 INFO mapreduce.Job: Running job: job_1479336097366_0002
However this job just sits in PREP state forever.
$ mapred job -list 16/11/16 22:11:45 INFO client.RMProxy: Connecting to ResourceManager at qa01-ost-wakefield-c-hb01.td.local/192.168.104.216:8032 Total jobs:1 JobId State StartTime UserName Queue Priority UsedContainers RsvdContainers UsedMem RsvdMem NeededMem AM info job_1479336097366_0002 PREP 1479351369780 hdfs root.hdfs NORMAL 0 0 0M 0M 0M http://qa01-ost-wakefield-c-hb01.td.local:8088/proxy/application_1479336097366_0002/
I'm assuming that there is some issue with the amount of memory or some configuration which is blocking this job from running, but I can't nail down where the issue is. Can anyone provide some tips on how to debug this issue and resolve it?
If you are starting with a cluster now I would strongly recommend that you use a CDH release much later than CDH 5.3. The later releases (CDH 5.8 or CDH 5.9) are far more stable than what you are trying to use now. Even if you stick with CDH 5.3 at least use the latest maintenance release.
Back to your question. There could be multiple things that cause your job to not start. First point to check would be the RM web UI and see what state the cluster and the scheduler is in. After that it depends on what the RM UI shows you...