Created on 05-15-2015 03:52 PM - edited 09-16-2022 02:29 AM
This may be a complete noob question, but we're shifting from CDH4 MR1 to CDH5 MR2. I had no problem navigating the menus in the previous version to find the stdout and stderr output from individual mappers and/or reducers, but I can't find them anywhere, either through the interface, on the Yarn node's disks, or on HDFS.
Could someone point me in the right direction?
Created 05-18-2015 03:30 PM
Hi -
What version of Cloudera Manager are you running? Cloudera Manager 5.0, unfortunately, had a bug that prevented the YARN Job History Server from picking up changes to the configured value for the YARN remote application log directory. This was fixed in CM 5.1.
If you are running CM 5.1+, the problem could have to do with the ownership or permissions of the remote application log directory (the HDFS directory where the logs are stored). By default, this is "/tmp/logs", but it can be configured to a different value in CM (YARN > Configuration > search for "remote app log dir").
This HDFS directory (and subdirectories) needs to be readable by group hadoop. Please check the ownership/permissions using "hdfs dfs -ls -R /tmp" from the command line (replacing "/tmp" with the value of YARN's remote app log dir, if it has been set to a different value). The group ownership should be "hadoop," and the permissions should allow group read access. If this is not the case, you can use "hdfs dfs -chown -R mapred:hadoop to set the group ownership recursively.
Regards,
Mark
Created 05-15-2015 04:27 PM
Hi,
If you click on the "Applications" link near the top of the YARN service page, you'll be taken to a page with information about YARN jobs. Clicking on a job ID link on that page will display a summary page for the job. From there, you can drill down into individual map and reduce tasks, and view the associated logs.
Regards,
Mark
Created 05-18-2015 12:59 PM
I think that the "drilling down into the individual map/reduce tasks" is where this falls apart for me.
When I click on the task (e.g. application_1431658373269_0170) it shows me a list of application masters.
From there I can click on the node id (e.g. hadoopslave0011p1mdw1.sendgrid.net:8042) This takes me to a page where I can see all of the containers currently running, and see logs for the node itself, which isn't what I need.
I can also click on "logs" for the application master. This takes me to a page that says
Error getting logs for container_e23_1431658373269_0170_01_000001
Which tells me that the cluster is mis-configured in some way, and isn't even producing them. Can you recommend a next step?
Created 05-18-2015 03:30 PM
Hi -
What version of Cloudera Manager are you running? Cloudera Manager 5.0, unfortunately, had a bug that prevented the YARN Job History Server from picking up changes to the configured value for the YARN remote application log directory. This was fixed in CM 5.1.
If you are running CM 5.1+, the problem could have to do with the ownership or permissions of the remote application log directory (the HDFS directory where the logs are stored). By default, this is "/tmp/logs", but it can be configured to a different value in CM (YARN > Configuration > search for "remote app log dir").
This HDFS directory (and subdirectories) needs to be readable by group hadoop. Please check the ownership/permissions using "hdfs dfs -ls -R /tmp" from the command line (replacing "/tmp" with the value of YARN's remote app log dir, if it has been set to a different value). The group ownership should be "hadoop," and the permissions should allow group read access. If this is not the case, you can use "hdfs dfs -chown -R mapred:hadoop to set the group ownership recursively.
Regards,
Mark
Created 05-26-2015 06:43 PM
Thank you, mfox. My problem was that the basic install set all of HDFS's groups to "superuser" instead of "hadoop". Changing it to "hadoop" allowed mapreduce to write its history logs to the correct location.