Created on 04-10-2014 01:01 PM - edited 09-16-2022 01:57 AM
I am doing some testing on a 10 node cluster (30GB memory each) using CDH5. I uploaded about 400GB of weather data across around 500 files, totaling about 4 billion lines of data into my HDFS. I'm trying to use Hive against this and just get a record count.
CREATE EXTERNAL TABLE weather.FP_MPE_GRID_SUB_IDW_RESULT_STEP1
( STATION_ID STRING, WX_DATE STRING,
HIGH_TMP_F DOUBLE, LOW_TMP_F DOUBLE,
TMP_F DOUBLE, REL_HUM_PCT DOUBLE,
WIND_SPEED_MPH DOUBLE, HIGHEST_WIND_GUST_MPH DOUBLE,
SOLAR_RAD_AVG_GLOBAL_WM2 DOUBLE, WATER_EQUIV_INCH DOUBLE,
STATION_ID2 STRING, ID INT, DIST double, inv_dist_wght double
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES terminated by '\n'
stored as textfile location '/user/bdanalytics/weather';
hive> describe weather.FP_MPE_GRID_SUB_IDW_RESULT_STEP1;
OK
station_id string None
wx_date string None
high_tmp_f double None
low_tmp_f double None
tmp_f double None
rel_hum_pct double None
wind_speed_mph double None
highest_wind_gust_mph double None
solar_rad_avg_global_wm2 double None
water_equiv_inch double None
station_id2 string None
id int None
dist double None
inv_dist_wght double None
Time taken: 0.201 seconds, Fetched: 14 row(s)
hive>
hive> select count(id) from weather.FP_MPE_GRID_SUB_IDW_RESULT_STEP1;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_1397147177898_0001, Tracking URL = http://hadoopg1:8088/proxy/application_1397147177898_0001/
Kill Command = /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop/bin/hadoop job -kill job_1397147177898_0001
Hadoop job information for Stage-1: number of mappers: 1548; number of reducers: 1
2014-04-10 16:43:50,346 Stage-1 map = 0%, reduce = 0%
2014-04-10 16:44:04,361 Stage-1 map = 1%, reduce = 0%, Cumulative CPU 49.42 sec
2014-04-10 16:44:05,519 Stage-1 map = 1%, reduce = 0%, Cumulative CPU 125.63 sec
2014-04-10 16:44:06,580 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 144.91 sec
2014-04-10 16:44:07,673 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 179.89 sec
2014-04-10 16:44:08,733 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 213.98 sec
2014-04-10 16:44:09,791 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 245.45 sec
2014-04-10 16:44:10,852 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 240.17 sec
2014-04-10 16:44:11,905 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 241.82 sec
2014-04-10 16:44:13,009 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 241.82 sec
2014-04-10 16:44:14,091 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 250.38 sec
2014-04-10 16:44:15,151 Stage-1 map = 4%, reduce = 0%, Cumulative CPU 255.41 sec
2014-04-10 16:44:16,235 Stage-1 map = 4%, reduce = 0%, Cumulative CPU 277.35 sec
2014-04-10 16:44:17,370 Stage-1 map = 4%, reduce = 0%, Cumulative CPU 301.2 sec
2014-04-10 16:44:18,411 Stage-1 map = 4%, reduce = 0%, Cumulative CPU 314.53 sec
2014-04-10 16:44:19,472 Stage-1 map = 4%, reduce = 0%, Cumulative CPU 329.22 sec
2014-04-10 16:44:20,545 Stage-1 map = 5%, reduce = 0%, Cumulative CPU 356.46 sec
2014-04-10 16:44:21,604 Stage-1 map = 5%, reduce = 0%, Cumulative CPU 387.29 sec
2014-04-10 16:44:22,705 Stage-1 map = 5%, reduce = 0%, Cumulative CPU 397.92 sec
2014-04-10 16:44:23,752 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 399.31 sec
2014-04-10 16:44:24,806 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 407.96 sec
2014-04-10 16:44:25,861 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 411.52 sec
2014-04-10 16:44:26,934 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 411.52 sec
2014-04-10 16:44:28,000 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 415.28 sec
2014-04-10 16:44:29,085 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 433.5 sec
2014-04-10 16:44:30,166 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 434.77 sec
2014-04-10 16:44:31,291 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 443.1 sec
2014-04-10 16:44:32,371 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 462.61 sec
2014-04-10 16:44:33,430 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 487.98 sec
2014-04-10 16:44:34,502 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 497.86 sec
2014-04-10 16:44:35,549 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 502.88 sec
2014-04-10 16:44:36,634 Stage-1 map = 12%, reduce = 0%, Cumulative CPU 510.7 sec
2014-04-10 16:44:37,670 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 507.2 sec
2014-04-10 16:44:38,706 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 507.2 sec
MapReduce Total cumulative CPU time: 8 minutes 27 seconds 200 msec
Ended Job = job_1397147177898_0001 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1397147177898_0001_m_000016 (and more) from job job_1397147177898_0001
Examining task ID: task_1397147177898_0001_m_000005 (and more) from job job_1397147177898_0001
Examining task ID: task_1397147177898_0001_m_000033 (and more) from job job_1397147177898_0001
Examining task ID: task_1397147177898_0001_m_000025 (and more) from job job_1397147177898_0001
Examining task ID: task_1397147177898_0001_m_000068 (and more) from job job_1397147177898_0001
Examining task ID: task_1397147177898_0001_m_000002 (and more) from job job_1397147177898_0001
Examining task ID: task_1397147177898_0001_m_000034 (and more) from job job_1397147177898_0001
Examining task ID: task_1397147177898_0001_m_000097 (and more) from job job_1397147177898_0001
Examining task ID: task_1397147177898_0001_m_000089 (and more) from job job_1397147177898_0001
Examining task ID: task_1397147177898_0001_m_000127 (and more) from job job_1397147177898_0001
Examining task ID: task_1397147177898_0001_m_000107 (and more) from job job_1397147177898_0001
Examining task ID: task_1397147177898_0001_m_000098 (and more) from job job_1397147177898_0001
Examining task ID: task_1397147177898_0001_m_000030 (and more) from job job_1397147177898_0001
Examining task ID: task_1397147177898_0001_m_000118 (and more) from job job_1397147177898_0001
Examining task ID: task_1397147177898_0001_m_000109 (and more) from job job_1397147177898_0001
Examining task ID: task_1397147177898_0001_m_000125 (and more) from job job_1397147177898_0001
Examining task ID: task_1397147177898_0001_m_000087 (and more) from job job_1397147177898_0001
Task with the most failures(4):
-----
Task ID:
task_1397147177898_0001_m_000030
URL:
-----
Diagnostic Messages for this Task:
Error: Java heap space
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1548 Reduce: 1 Cumulative CPU: 507.2 sec HDFS Read: 39564410523 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 8 minutes 27 seconds 200 msec
hive>
Looking for advice on maybe specific tuning parameters working with this size of data and what may be commonly needed to let this query run. Did some Googling and tried a number of parms but nothing has yet had any change to the error or the percent I make it through before blowing up.
Created 04-11-2014 10:31 AM
Researching some information about tuning YARN memory to fit the cluster size, I found the following settings to work for my configuration:
yarn.nodemanager.resource.memory-mb = 20GB
yarn.scheduler.minimum-allocation-mb = 4GB
yarn.scheduler.maximum-allocation-mb = 20GB
mapreduce.map.memory.mb = 4GB
mapreduce.reduce.memory.mb = 8GB
mapreduce.map.java.opts = 3.2GB
mapreduce.reduce.java.opts = 6.4GB
yarn.app.mapreduce.am.resource.mb = 8GB
yarn.app.mapreduce.am.command-opts = 6.4GB
That allowed my particular Hive query to execute on our 10 node cluster with 30GB physical RAM each.
Created 04-11-2014 10:31 AM
Researching some information about tuning YARN memory to fit the cluster size, I found the following settings to work for my configuration:
yarn.nodemanager.resource.memory-mb = 20GB
yarn.scheduler.minimum-allocation-mb = 4GB
yarn.scheduler.maximum-allocation-mb = 20GB
mapreduce.map.memory.mb = 4GB
mapreduce.reduce.memory.mb = 8GB
mapreduce.map.java.opts = 3.2GB
mapreduce.reduce.java.opts = 6.4GB
yarn.app.mapreduce.am.resource.mb = 8GB
yarn.app.mapreduce.am.command-opts = 6.4GB
That allowed my particular Hive query to execute on our 10 node cluster with 30GB physical RAM each.