Member since
04-10-2014
2
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
45113 | 04-11-2014 10:31 AM |
04-11-2014
10:31 AM
2 Kudos
Researching some information about tuning YARN memory to fit the cluster size, I found the following settings to work for my configuration: yarn.nodemanager.resource.memory-mb = 20GB yarn.scheduler.minimum-allocation-mb = 4GB yarn.scheduler.maximum-allocation-mb = 20GB mapreduce.map.memory.mb = 4GB mapreduce.reduce.memory.mb = 8GB mapreduce.map.java.opts = 3.2GB mapreduce.reduce.java.opts = 6.4GB yarn.app.mapreduce.am.resource.mb = 8GB yarn.app.mapreduce.am.command-opts = 6.4GB That allowed my particular Hive query to execute on our 10 node cluster with 30GB physical RAM each.
... View more
04-10-2014
01:01 PM
I am doing some testing on a 10 node cluster (30GB memory each) using CDH5. I uploaded about 400GB of weather data across around 500 files, totaling about 4 billion lines of data into my HDFS. I'm trying to use Hive against this and just get a record count. CREATE EXTERNAL TABLE weather.FP_MPE_GRID_SUB_IDW_RESULT_STEP1 ( STATION_ID STRING, WX_DATE STRING, HIGH_TMP_F DOUBLE, LOW_TMP_F DOUBLE, TMP_F DOUBLE, REL_HUM_PCT DOUBLE, WIND_SPEED_MPH DOUBLE, HIGHEST_WIND_GUST_MPH DOUBLE, SOLAR_RAD_AVG_GLOBAL_WM2 DOUBLE, WATER_EQUIV_INCH DOUBLE, STATION_ID2 STRING, ID INT, DIST double, inv_dist_wght double ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES terminated by '\n' stored as textfile location '/user/bdanalytics/weather'; hive> describe weather.FP_MPE_GRID_SUB_IDW_RESULT_STEP1; OK station_id string None wx_date string None high_tmp_f double None low_tmp_f double None tmp_f double None rel_hum_pct double None wind_speed_mph double None highest_wind_gust_mph double None solar_rad_avg_global_wm2 double None water_equiv_inch double None station_id2 string None id int None dist double None inv_dist_wght double None Time taken: 0.201 seconds, Fetched: 14 row(s) hive> hive> select count(id) from weather.FP_MPE_GRID_SUB_IDW_RESULT_STEP1; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_1397147177898_0001, Tracking URL = http://hadoopg1:8088/proxy/application_1397147177898_0001/ Kill Command = /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop/bin/hadoop job -kill job_1397147177898_0001 Hadoop job information for Stage-1: number of mappers: 1548; number of reducers: 1 2014-04-10 16:43:50,346 Stage-1 map = 0%, reduce = 0% 2014-04-10 16:44:04,361 Stage-1 map = 1%, reduce = 0%, Cumulative CPU 49.42 sec 2014-04-10 16:44:05,519 Stage-1 map = 1%, reduce = 0%, Cumulative CPU 125.63 sec 2014-04-10 16:44:06,580 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 144.91 sec 2014-04-10 16:44:07,673 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 179.89 sec 2014-04-10 16:44:08,733 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 213.98 sec 2014-04-10 16:44:09,791 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 245.45 sec 2014-04-10 16:44:10,852 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 240.17 sec 2014-04-10 16:44:11,905 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 241.82 sec 2014-04-10 16:44:13,009 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 241.82 sec 2014-04-10 16:44:14,091 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 250.38 sec 2014-04-10 16:44:15,151 Stage-1 map = 4%, reduce = 0%, Cumulative CPU 255.41 sec 2014-04-10 16:44:16,235 Stage-1 map = 4%, reduce = 0%, Cumulative CPU 277.35 sec 2014-04-10 16:44:17,370 Stage-1 map = 4%, reduce = 0%, Cumulative CPU 301.2 sec 2014-04-10 16:44:18,411 Stage-1 map = 4%, reduce = 0%, Cumulative CPU 314.53 sec 2014-04-10 16:44:19,472 Stage-1 map = 4%, reduce = 0%, Cumulative CPU 329.22 sec 2014-04-10 16:44:20,545 Stage-1 map = 5%, reduce = 0%, Cumulative CPU 356.46 sec 2014-04-10 16:44:21,604 Stage-1 map = 5%, reduce = 0%, Cumulative CPU 387.29 sec 2014-04-10 16:44:22,705 Stage-1 map = 5%, reduce = 0%, Cumulative CPU 397.92 sec 2014-04-10 16:44:23,752 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 399.31 sec 2014-04-10 16:44:24,806 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 407.96 sec 2014-04-10 16:44:25,861 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 411.52 sec 2014-04-10 16:44:26,934 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 411.52 sec 2014-04-10 16:44:28,000 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 415.28 sec 2014-04-10 16:44:29,085 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 433.5 sec 2014-04-10 16:44:30,166 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 434.77 sec 2014-04-10 16:44:31,291 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 443.1 sec 2014-04-10 16:44:32,371 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 462.61 sec 2014-04-10 16:44:33,430 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 487.98 sec 2014-04-10 16:44:34,502 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 497.86 sec 2014-04-10 16:44:35,549 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 502.88 sec 2014-04-10 16:44:36,634 Stage-1 map = 12%, reduce = 0%, Cumulative CPU 510.7 sec 2014-04-10 16:44:37,670 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 507.2 sec 2014-04-10 16:44:38,706 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 507.2 sec MapReduce Total cumulative CPU time: 8 minutes 27 seconds 200 msec Ended Job = job_1397147177898_0001 with errors Error during job, obtaining debugging information... Examining task ID: task_1397147177898_0001_m_000016 (and more) from job job_1397147177898_0001 Examining task ID: task_1397147177898_0001_m_000005 (and more) from job job_1397147177898_0001 Examining task ID: task_1397147177898_0001_m_000033 (and more) from job job_1397147177898_0001 Examining task ID: task_1397147177898_0001_m_000025 (and more) from job job_1397147177898_0001 Examining task ID: task_1397147177898_0001_m_000068 (and more) from job job_1397147177898_0001 Examining task ID: task_1397147177898_0001_m_000002 (and more) from job job_1397147177898_0001 Examining task ID: task_1397147177898_0001_m_000034 (and more) from job job_1397147177898_0001 Examining task ID: task_1397147177898_0001_m_000097 (and more) from job job_1397147177898_0001 Examining task ID: task_1397147177898_0001_m_000089 (and more) from job job_1397147177898_0001 Examining task ID: task_1397147177898_0001_m_000127 (and more) from job job_1397147177898_0001 Examining task ID: task_1397147177898_0001_m_000107 (and more) from job job_1397147177898_0001 Examining task ID: task_1397147177898_0001_m_000098 (and more) from job job_1397147177898_0001 Examining task ID: task_1397147177898_0001_m_000030 (and more) from job job_1397147177898_0001 Examining task ID: task_1397147177898_0001_m_000118 (and more) from job job_1397147177898_0001 Examining task ID: task_1397147177898_0001_m_000109 (and more) from job job_1397147177898_0001 Examining task ID: task_1397147177898_0001_m_000125 (and more) from job job_1397147177898_0001 Examining task ID: task_1397147177898_0001_m_000087 (and more) from job job_1397147177898_0001 Task with the most failures(4): ----- Task ID: task_1397147177898_0001_m_000030 URL: http://hadoopg1:8088/taskdetails.jsp?jobid=job_1397147177898_0001&tipid=task_1397147177898_0001_m_000030 ----- Diagnostic Messages for this Task: Error: Java heap space Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Job 0: Map: 1548 Reduce: 1 Cumulative CPU: 507.2 sec HDFS Read: 39564410523 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 8 minutes 27 seconds 200 msec hive> Looking for advice on maybe specific tuning parameters working with this size of data and what may be commonly needed to let this query run. Did some Googling and tried a number of parms but nothing has yet had any change to the error or the percent I make it through before blowing up.
... View more
Labels: