- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Where does MR2 output stderr to?
- Labels:
-
Apache YARN
-
HDFS
Created on ‎05-15-2015 03:52 PM - edited ‎09-16-2022 02:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This may be a complete noob question, but we're shifting from CDH4 MR1 to CDH5 MR2. I had no problem navigating the menus in the previous version to find the stdout and stderr output from individual mappers and/or reducers, but I can't find them anywhere, either through the interface, on the Yarn node's disks, or on HDFS.
Could someone point me in the right direction?
Created ‎05-18-2015 03:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi -
What version of Cloudera Manager are you running? Cloudera Manager 5.0, unfortunately, had a bug that prevented the YARN Job History Server from picking up changes to the configured value for the YARN remote application log directory. This was fixed in CM 5.1.
If you are running CM 5.1+, the problem could have to do with the ownership or permissions of the remote application log directory (the HDFS directory where the logs are stored). By default, this is "/tmp/logs", but it can be configured to a different value in CM (YARN > Configuration > search for "remote app log dir").
This HDFS directory (and subdirectories) needs to be readable by group hadoop. Please check the ownership/permissions using "hdfs dfs -ls -R /tmp" from the command line (replacing "/tmp" with the value of YARN's remote app log dir, if it has been set to a different value). The group ownership should be "hadoop," and the permissions should allow group read access. If this is not the case, you can use "hdfs dfs -chown -R mapred:hadoop to set the group ownership recursively.
Regards,
Mark
Created ‎05-15-2015 04:27 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
If you click on the "Applications" link near the top of the YARN service page, you'll be taken to a page with information about YARN jobs. Clicking on a job ID link on that page will display a summary page for the job. From there, you can drill down into individual map and reduce tasks, and view the associated logs.
Regards,
Mark
Created ‎05-18-2015 12:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think that the "drilling down into the individual map/reduce tasks" is where this falls apart for me.
When I click on the task (e.g. application_1431658373269_0170) it shows me a list of application masters.
From there I can click on the node id (e.g. hadoopslave0011p1mdw1.sendgrid.net:8042) This takes me to a page where I can see all of the containers currently running, and see logs for the node itself, which isn't what I need.
I can also click on "logs" for the application master. This takes me to a page that says
Error getting logs for container_e23_1431658373269_0170_01_000001
Which tells me that the cluster is mis-configured in some way, and isn't even producing them. Can you recommend a next step?
Created ‎05-18-2015 03:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi -
What version of Cloudera Manager are you running? Cloudera Manager 5.0, unfortunately, had a bug that prevented the YARN Job History Server from picking up changes to the configured value for the YARN remote application log directory. This was fixed in CM 5.1.
If you are running CM 5.1+, the problem could have to do with the ownership or permissions of the remote application log directory (the HDFS directory where the logs are stored). By default, this is "/tmp/logs", but it can be configured to a different value in CM (YARN > Configuration > search for "remote app log dir").
This HDFS directory (and subdirectories) needs to be readable by group hadoop. Please check the ownership/permissions using "hdfs dfs -ls -R /tmp" from the command line (replacing "/tmp" with the value of YARN's remote app log dir, if it has been set to a different value). The group ownership should be "hadoop," and the permissions should allow group read access. If this is not the case, you can use "hdfs dfs -chown -R mapred:hadoop to set the group ownership recursively.
Regards,
Mark
Created ‎05-26-2015 06:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, mfox. My problem was that the basic install set all of HDFS's groups to "superuser" instead of "hadoop". Changing it to "hadoop" allowed mapreduce to write its history logs to the correct location.
