Support Questions

Find answers, ask questions, and share your expertise

CDH 5.0.1 Amazon AWS - MRV2 jobs not starting

avatar
Contributor

Hello All,

 

I installed the CDH 5.0.1 on Amazon AWS using four node cluster (one name node and three data node) and all looks great from cloudera manager console(i am also using kerberos within AWS).

 

But when I I run the below test as well as pig script they all getting stuck and jobs not getting started. I can move the files in and out of hdfs without any problem.

 

I may need to open up the ports but want to know the port numbers to make this working - pls help.

 

hadoop fs -mkdir /user/hdfs/input/

hadoop fs -put /etc/hadoop/conf/*.xml input

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23 'dfs[a-z.]+'

 

14/06/02 10:17:22 INFO client.RMProxy: Connecting to ResourceManager at awsdve1ahdpnm1.ops.tiaa-cref.org/10.22.10.113:8032
14/06/02 10:17:23 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 20 for hdfs on 10.22.10.113:8020
14/06/02 10:17:23 INFO security.TokenCache: Got dt for hdfs://awsdve1ahdpnm1.ops.tiaa-cref.org:8020; Kind: HDFS_DELEGATION_TOKEN, Service: 10.22.10.113:8020, Ident: (HDFS_DELEGATION_TOKEN token 20 for hdfs)
14/06/02 10:17:23 WARN mapreduce.JobSubmitter: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
14/06/02 10:17:23 INFO input.FileInputFormat: Total input paths to process : 4
14/06/02 10:17:23 INFO mapreduce.JobSubmitter: number of splits:4
14/06/02 10:17:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1401383446374_0007
14/06/02 10:17:24 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: 10.22.10.113:8020, Ident: (HDFS_DELEGATION_TOKEN token 20 for hdfs)
14/06/02 10:17:24 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
14/06/02 10:17:24 INFO impl.YarnClientImpl: Submitted application application_1401383446374_0007
14/06/02 10:17:24 INFO mapreduce.Job: The url to track the job: http://awsdve1ahdpnm1.ops.tiaa-cref.org:8088/proxy/application_1401383446374_0007/
14/06/02 10:17:24 INFO mapreduce.Job: Running job: job_1401383446374_0007

1 ACCEPTED SOLUTION

avatar
Contributor

Hello All,

 

This issue was resolved - working with Kevin Odel from  Cloudera - summary as follows

 

1.) First time install on Amazon AWS  - install done by using package

2.) Configured Kerberos

3.) All the services started good and nice

4.) We can move files in and out of hdfs - no issues

5.) YARN manager status will be good on  Cloudera Manager

6.) When you run example or pig scripts it will stuck at 0% with status of scheduled to submitted

 

Troubleshooting:

 

1.) go to YARN  Resource Manager Web UI

           - click on Nodes

               - you will not see any nodes listed (basically no node manager on the cluster)

               - above is the reason you jobs stuck with 0% 

2.) Go to each Node Manager look at /var/log/hadoop-yarn/*

 

You see the below error

 

2014-05-28 12:36:12,911 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-05-28 12:36:12,911 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-05-28 16:29:50,916 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-05-28 16:29:50,916 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-05-29 13:09:54,340 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-05-29 13:09:54,340 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-06-02 13:43:10,736 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-06-02 13:43:10,736 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-06-02 13:58:15,344 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-06-02 13:58:15,344 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;

 

To Fix it:

 

go to CM  - YARN - make modification to yarn.local.dir   (make sure all the nodes on same group or do for each group of nodes)

stop  YARN cluster

deploy the configuration (with new yarn.local.dir)

start  YARN Cluster

now - go back to  YARN Resource Manager WEB UI  -you should see all the nodes

      - if node is missing - it means it was not able to create a directory for local

      - go to node which is not visiable on Resource manager and check at the /var/log/hadoop-yarn/* logs file

      - fix manual or from CM

     - redeploy configuration from CM

     - start YARN services

once you make sure all the nodes visiable on Resource Manager - Nodes on Yarn

 

You can submit your jobs  now - it will complete.

 

Thanks,

Ram

 

View solution in original post

1 REPLY 1

avatar
Contributor

Hello All,

 

This issue was resolved - working with Kevin Odel from  Cloudera - summary as follows

 

1.) First time install on Amazon AWS  - install done by using package

2.) Configured Kerberos

3.) All the services started good and nice

4.) We can move files in and out of hdfs - no issues

5.) YARN manager status will be good on  Cloudera Manager

6.) When you run example or pig scripts it will stuck at 0% with status of scheduled to submitted

 

Troubleshooting:

 

1.) go to YARN  Resource Manager Web UI

           - click on Nodes

               - you will not see any nodes listed (basically no node manager on the cluster)

               - above is the reason you jobs stuck with 0% 

2.) Go to each Node Manager look at /var/log/hadoop-yarn/*

 

You see the below error

 

2014-05-28 12:36:12,911 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-05-28 12:36:12,911 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-05-28 16:29:50,916 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-05-28 16:29:50,916 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-05-29 13:09:54,340 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-05-29 13:09:54,340 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-06-02 13:43:10,736 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-06-02 13:43:10,736 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-06-02 13:58:15,344 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-06-02 13:58:15,344 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;

 

To Fix it:

 

go to CM  - YARN - make modification to yarn.local.dir   (make sure all the nodes on same group or do for each group of nodes)

stop  YARN cluster

deploy the configuration (with new yarn.local.dir)

start  YARN Cluster

now - go back to  YARN Resource Manager WEB UI  -you should see all the nodes

      - if node is missing - it means it was not able to create a directory for local

      - go to node which is not visiable on Resource manager and check at the /var/log/hadoop-yarn/* logs file

      - fix manual or from CM

     - redeploy configuration from CM

     - start YARN services

once you make sure all the nodes visiable on Resource Manager - Nodes on Yarn

 

You can submit your jobs  now - it will complete.

 

Thanks,

Ram