Member since
06-03-2014
62
Posts
3
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2954 | 11-30-2017 10:32 AM | |
4903 | 01-20-2016 05:08 PM | |
2266 | 01-13-2015 02:42 PM | |
4629 | 11-12-2014 11:09 AM | |
10990 | 08-20-2014 09:29 AM |
09-15-2014
08:39 AM
We are running Yarn on CDH 5.1 with 14 nodes using 6 GB of memory. I understand this is not a log of memory, but it is all we could put together. Most jobs complete without error, but a few of the larger MapReduce jobs fail with an out of Java heap memory error. The jobs fail on a Reduce task that either sorts or groups data. We recently upgraded to CDH 5.1 from CDH 4.7 and ALL of these jobs succeeded on MapReduce v1. Looking in the logs I see that the Application has retired a few times before failing. Can you see anything wrong with the way the resources are configured? Java Heap Size of NodeManager in Bytes 1 GB yarn.nodemanager.resource.memory-mb 6 GB yarn.scheduler.minimum-allocation-mb 1 GB yarn.scheduler.maximum-allocation-mb 6 GB yarn.app.mapreduce.am.resource.mb 1.5 GB yarn.nodemanager.container-manager.thread-count 20 yarn.resourcemanager.resource-tracker.client.thread-count 20 mapreduce.map.memory.mb 1.5 GB mapreduce.reduce.memory.mb 3 GB mapreduce.map.java.opts "-Djava.net.preferIPv4Stack=true -Xmx 1228m"; mapreduce.reduce.java.opts "-Djava.net.preferIPv4Stack=true -Xmx2457m"; mapreduce.task.io.sort.factor 5 mapreduce.task.io.sort.mb 512 MB mapreduce.job.reduces 2 mapreduce.reduce.shuffle.parallelcopies 4 One thing that might help, Yarn runs 4 containers per node, can this be reduced?
... View more
Labels:
08-20-2014
09:29 AM
1 Kudo
Romain, Thank you so much for your help, and for sticking with me through this problem. I have resolved the issue. There were actually two problems. After the upgrade to CDH 5, I had to stop Oozie and Install Sharelib. Finally, in YARN I had to adjust the resources. The Java Heap Size had been set to 50 MB when 8 GB of memory is available to the node (I set heap memory to 1 GB on the nodes and resource manager). I don't know why the CDH update would default to such a low number - this made YARN completely unusable. This explains why jobs would hang forever as there was not enough resources available. However, the logs did indicate this problem. I have one last question, how much memory do you give to the Java heap on the resource manager, under Java Heap Size of ResourceManager in Bytes, when the nodes are given 1 GB. I gave this 1 GB to resolve the problem, but I'm not sure if that is enough. And what about the Container sizes? Thanks, Kevin
... View more
08-19-2014
02:34 PM
Romain, I applied the change from step #5 in the document: http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/, but unfortunately, it did not help. But this looks very similar to my problem. I tried to narrow down the problem I'm having with running Pig scripts through Hue and YARN. Here is what I do: 1. Create a Pig Script in Hue: offers = LOAD '/tmp/datafile.txt' USING PigStorage AS (name:CHARARRAY); The script succeeds. 2. However, when I add a dump to the script, like this: offers = LOAD '/tmp/datafile.txt' USING PigStorage AS (name:CHARARRAY); dump offers; The script never moves past 0% and repeats Heat beat over an over again. The job displays in Oozie but never goes anywhere (the job is stuck on RUNNING). This same script worked in CDH 4.7 using MRv1. I can't find much in the logs to help identify a problem, it just never finishes. Here is an excerpt from the job's log: 2014-08-19 14:31:01,128 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://servernode05:50030/jobdetails.jsp?jobid=job_1408403413938_0014 2014-08-19 14:31:01,227 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete Heart beat Heart beat Heart beat Heart beat
... View more
08-19-2014
01:59 PM
We run 12 CDH 5.1 nodes managed by CM 5.0.2. We recently upgraded from CDH 4.7 to CDH 5.1 and since the update we have not been able to run a Pig script using YARN/MapReduce. We run the following services: Flume HDFS 2.3.0-cdh5.1.0 HBase 0.98.1-cdh5.1.0 Hive Hue 3.6.0 Impala Oozie Solr Spark YARN (with MRv2) ZooKeeper
... View more
08-19-2014
11:23 AM
This might help, in the Hue Server Logs, I see the following error: [19/Aug/2014 11:12:45 -0700] api ERROR An error happen while watching the demo running: Could not find job job_1408403413938_0008. [19/Aug/2014 11:12:45 -0700] connectionpool DEBUG "GET /ws/v1/history/mapreduce/jobs/job_1408403413938_0008 HTTP/1.1" 404 None [19/Aug/2014 11:12:45 -0700] connectionpool DEBUG Setting read timeout to None
... View more
08-19-2014
10:05 AM
Thanks for your help Romain. The sharelib is the one used for Yarn: oozie-sharelib-yarn.tar.gz I've enclosed the configuration of the job from Oozie, but this looks like it is using Yarn. The job starts, but never finishes, instead it repeats Heart beat over and over. I see an entry in the log that refers to port 50030, which is why it looks like it is using MRv1. But I can see the job in Yarn's ResourceManager, it is RUNNING, but never finishes until killed. Name Value hue-id-w 59 jobTracker servername05:8032 mapreduce.job.user.name admin nameNode hdfs://namenode02:8020 oozie.use.system.libpath true oozie.wf.application.path hdfs://namenode02:8020/user/hue/oozie/workspaces/_admin_-oozie-59-1408466201.2 user.name admin
... View more
08-19-2014
09:41 AM
I stopped Oozie and re-installed the Oozie sharelib. And I validated that Hue is setup correctly (as indicated in the gethue link). However, Pig jobs are still not completing and they are still being sent to MapReduce at 50030. In the log I see that YARN is referenced. Why is the job being sent to MapReduce and not YARN: 2014-08-19 09:38:02,182 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://servername:50030/jobdetails.jsp?jobid=job_1408403413938_0006 2014-08-19 09:38:02,264 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete Heart beat The job then repeats Heart beat over an over until killed.
... View more
08-18-2014
02:33 PM
I think I know the problem, but I don't know why. After upgrading from MR to YARN, why is YARN still trying to assign jobs to the old MR JobTracker on 50030. Didn't YARN replaced MR? When I look into the YARN conf at http://yarnresourcemanager:8088/conf, I see the following entries: <property> <name>mapreduce.jobtracker.http.address</name> <value>0.0.0.0:50030</value> <source>mapred-default.xml</source> </property> <property> <name>mapreduce.tasktracker.http.address</name> <value>0.0.0.0:50060</value> <source>mapred-default.xml</source> </property> Why are these here? I thought YARN replaced MR? This explains why my Pig script tries to use MR. How can I fix this?
... View more
08-15-2014
02:48 PM
Thank you Romain, that was helpful. I see something odd. In the log file, the job is unassigned, and I see an entry pointing to a JobTracker at port 50030. However, I upgraded to YARN during the CDH upgrade from 4.7 to 5. The MapReduce TaskTracker and JobTrackers were removed during this upgrade. For example, in the log I see this line: 2014-08-14 16:24:35,692 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://hadoopnode05:50030/jobdetails.jsp?jobid=job_1408018429315_0002 hadoopnode05 is running the ResourceManager, but not a JobTracker. ResourceManager runs on port 8032. From the Oozie log, I see that the job is unassigned, but it should be a MapReduce application. Doesn't YARN take care of this? User: someone Name: PigLatin:script.pig Application Type: MAPREDUCE Application Tags: oozie-f8b1c706728ef633ff0dad6c4aed48a State: ACCEPTED FinalStatus: UNDEFINED Started: 14-Aug-2014 16:24:35 Elapsed: 22hrs, 17mins, 59sec Tracking URL: UNASSIGNED Do you have any suggestions? Kevin
... View more
08-14-2014
04:37 PM
Thanks for your help, the rest of YARN Resource Manager output from after the warning is as follows: 2014-08-14 16:24:35,564 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Accepted application application_1408018429315_0002 from user: admin, in queue: default, currently num of applications: 2 2014-08-14 16:24:35,565 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1408018429315_0002 State change from SUBMITTED to ACCEPTED 2014-08-14 16:24:35,565 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering app attempt : appattempt_1408018429315_0002_000001 2014-08-14 16:24:35,566 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1408018429315_0002_000001 State change from NEW to SUBMITTED 2014-08-14 16:24:35,568 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Added Application Attempt appattempt_1408018429315_0002_000001 to scheduler from user: admin 2014-08-14 16:24:35,568 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1408018429315_0002_000001 State change from SUBMITTED to SCHEDULED Then the log repeats the following text a thousand times and the job never completes: 2014-08-14 16:24:35,848 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://servername09:59181/ws/v1/mapreduce/jobs/job_1408018429315_0001 which is the app master GUI of application_1408018429315_0001 owned by admin 2014-08-14 16:24:35,859 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://servername09:59181/ws/v1/mapreduce/jobs/job_1408018429315_0001/tasks which is the app master GUI of application_1408018429315_0001 owned by admin 2014-08-14 16:24:35,870 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://servername09:59181/ws/v1/mapreduce/jobs/job_1408018429315_0001/tasks which is the app master GUI of application_1408018429315_0001 owned by admin 2014-08-14 16:24:35,881 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://servername09:59181/ws/v1/mapreduce/jobs/job_1408018429315_0001/tasks/task_1408018429315_0001_m_000000/attempts which is the app master GUI of application_1408018429315_0001 owned by admin 2014-08-14 16:24:35,898 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://servername09:59181/ws/v1/mapreduce/jobs/job_1408018429315_0001 which is the app master GUI of application_1408018429315_0001 owned by admin 2014-08-14 16:24:35,916 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://servername09:59181/ws/v1/mapreduce/jobs/job_1408018429315_0001 which is the app master GUI of application_1408018429315_0001 owned by admin 2014-08-14 16:24:35,928 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://servername09:59181/ws/v1/mapreduce/jobs/job_1408018429315_0001/tasks/task_1408018429315_0001_m_000000 which is the app master GUI of application_1408018429315_0001 owned by admin 2014-08-14 16:24:35,940 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://servername09:59181/ws/v1/mapreduce/jobs/job_1408018429315_0001/tasks/task_1408018429315_0001_m_000000/attempts/attempt_1408018429315_0001_m_000000_0 which is the app master GUI of application_1408018429315_0001 owned by admin 2014-08-14 16:24:35,950 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://servername09:59181/ws/v1/mapreduce/jobs/job_1408018429315_0001/jobattempts which is the app master GUI of application_1408018429315_0001 owned by admin 2014-08-14 16:24:36,847 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://servername09:59181/ws/v1/mapreduce/jobs/job_1408018429315_0001 which is the app master GUI of application_1408018429315_0001 owned by admin 2014-08-14 16:24:36,859 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://servername09:59181/ws/v1/mapreduce/jobs/job_1408018429315_0001/tasks which is the app master GUI of application_1408018429315_0001 owned by admin 2014-08-14 16:24:36,870 INFO org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet: dr.who is accessing unchecked http://servername09:59181/ws/v1/mapreduce/jobs/job_1408018429315_0001/tasks which is the app master GUI of application_1408018429315_0001 owned by admin What other logs can I look at? The Pig script is very simple: offers = LOAD '/staging/file_20140727/20140727.txt' USING PigStorage AS ( fileid:CHARARRAY, offerPrice:CHARARRAY, upc:CHARARRAY, productName:CHARARRAY, productDescription:CHARARRAY); stores = LOAD '/staging/file_20140727/20140727_stores.txt' USING PigStorage AS (storeNumber:CHARARRAY, address:CHARARRAY, city:CHARARRAY, state:CHARARRAY, zip:CHARARRAY, country:CHARARRAY); stores2 = FILTER stores by (storeNumber == '1') OR (storeNumber == '100'); store_offers = CROSS stores2, offers; dump store_offers;
... View more
- « Previous
- Next »