Created on 06-02-2014 07:27 AM - edited 09-16-2022 01:59 AM
Hello All,
I installed the CDH 5.0.1 on Amazon AWS using four node cluster (one name node and three data node) and all looks great from cloudera manager console(i am also using kerberos within AWS).
But when I I run the below test as well as pig script they all getting stuck and jobs not getting started. I can move the files in and out of hdfs without any problem.
I may need to open up the ports but want to know the port numbers to make this working - pls help.
hadoop fs -mkdir /user/hdfs/input/
hadoop fs -put /etc/hadoop/conf/*.xml input
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23 'dfs[a-z.]+'
14/06/02 10:17:22 INFO client.RMProxy: Connecting to ResourceManager at awsdve1ahdpnm1.ops.tiaa-cref.org/10.22.10.113:8032
14/06/02 10:17:23 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 20 for hdfs on 10.22.10.113:8020
14/06/02 10:17:23 INFO security.TokenCache: Got dt for hdfs://awsdve1ahdpnm1.ops.tiaa-cref.org:8020; Kind: HDFS_DELEGATION_TOKEN, Service: 10.22.10.113:8020, Ident: (HDFS_DELEGATION_TOKEN token 20 for hdfs)
14/06/02 10:17:23 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
14/06/02 10:17:23 INFO input.FileInputFormat: Total input paths to process : 4
14/06/02 10:17:23 INFO mapreduce.JobSubmitter: number of splits:4
14/06/02 10:17:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1401383446374_0007
14/06/02 10:17:24 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: 10.22.10.113:8020, Ident: (HDFS_DELEGATION_TOKEN token 20 for hdfs)
14/06/02 10:17:24 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
14/06/02 10:17:24 INFO impl.YarnClientImpl: Submitted application application_1401383446374_0007
14/06/02 10:17:24 INFO mapreduce.Job: The url to track the job: http://awsdve1ahdpnm1.ops.tiaa-cref.org:8088/proxy/application_1401383446374_0007/
14/06/02 10:17:24 INFO mapreduce.Job: Running job: job_1401383446374_0007
Created 06-02-2014 01:29 PM
Hello All,
This issue was resolved - working with Kevin Odel from Cloudera - summary as follows
1.) First time install on Amazon AWS - install done by using package
2.) Configured Kerberos
3.) All the services started good and nice
4.) We can move files in and out of hdfs - no issues
5.) YARN manager status will be good on Cloudera Manager
6.) When you run example or pig scripts it will stuck at 0% with status of scheduled to submitted
Troubleshooting:
1.) go to YARN Resource Manager Web UI
- click on Nodes
- you will not see any nodes listed (basically no node manager on the cluster)
- above is the reason you jobs stuck with 0%
2.) Go to each Node Manager look at /var/log/hadoop-yarn/*
You see the below error
2014-05-28 12:36:12,911 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-05-28 12:36:12,911 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-05-28 16:29:50,916 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-05-28 16:29:50,916 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-05-29 13:09:54,340 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-05-29 13:09:54,340 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-06-02 13:43:10,736 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-06-02 13:43:10,736 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-06-02 13:58:15,344 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-06-02 13:58:15,344 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
To Fix it:
go to CM - YARN - make modification to yarn.local.dir (make sure all the nodes on same group or do for each group of nodes)
stop YARN cluster
deploy the configuration (with new yarn.local.dir)
start YARN Cluster
now - go back to YARN Resource Manager WEB UI -you should see all the nodes
- if node is missing - it means it was not able to create a directory for local
- go to node which is not visiable on Resource manager and check at the /var/log/hadoop-yarn/* logs file
- fix manual or from CM
- redeploy configuration from CM
- start YARN services
once you make sure all the nodes visiable on Resource Manager - Nodes on Yarn
You can submit your jobs now - it will complete.
Thanks,
Ram
Created 06-02-2014 01:29 PM
Hello All,
This issue was resolved - working with Kevin Odel from Cloudera - summary as follows
1.) First time install on Amazon AWS - install done by using package
2.) Configured Kerberos
3.) All the services started good and nice
4.) We can move files in and out of hdfs - no issues
5.) YARN manager status will be good on Cloudera Manager
6.) When you run example or pig scripts it will stuck at 0% with status of scheduled to submitted
Troubleshooting:
1.) go to YARN Resource Manager Web UI
- click on Nodes
- you will not see any nodes listed (basically no node manager on the cluster)
- above is the reason you jobs stuck with 0%
2.) Go to each Node Manager look at /var/log/hadoop-yarn/*
You see the below error
2014-05-28 12:36:12,911 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-05-28 12:36:12,911 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-05-28 16:29:50,916 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-05-28 16:29:50,916 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-05-29 13:09:54,340 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-05-29 13:09:54,340 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-06-02 13:43:10,736 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-06-02 13:43:10,736 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-06-02 13:58:15,344 INFO org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Disk(s) failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
2014-06-02 13:58:15,344 ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs turned bad: /dfs/dn/yarn/nm;
To Fix it:
go to CM - YARN - make modification to yarn.local.dir (make sure all the nodes on same group or do for each group of nodes)
stop YARN cluster
deploy the configuration (with new yarn.local.dir)
start YARN Cluster
now - go back to YARN Resource Manager WEB UI -you should see all the nodes
- if node is missing - it means it was not able to create a directory for local
- go to node which is not visiable on Resource manager and check at the /var/log/hadoop-yarn/* logs file
- fix manual or from CM
- redeploy configuration from CM
- start YARN services
once you make sure all the nodes visiable on Resource Manager - Nodes on Yarn
You can submit your jobs now - it will complete.
Thanks,
Ram