Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive Java heap error running query (exit code 143)

avatar
New Contributor

I am doing some testing on a 10 node cluster (30GB memory each) using CDH5. I uploaded about 400GB of weather data across around 500 files, totaling about 4 billion lines of data into my HDFS. I'm trying to use Hive against this and just get a record count.

 

CREATE EXTERNAL TABLE weather.FP_MPE_GRID_SUB_IDW_RESULT_STEP1

( STATION_ID STRING,     WX_DATE STRING,           

HIGH_TMP_F DOUBLE,     LOW_TMP_F DOUBLE,

TMP_F DOUBLE,     REL_HUM_PCT DOUBLE,

WIND_SPEED_MPH DOUBLE,                 HIGHEST_WIND_GUST_MPH DOUBLE,          

SOLAR_RAD_AVG_GLOBAL_WM2 DOUBLE,                        WATER_EQUIV_INCH DOUBLE,

STATION_ID2 STRING, ID INT, DIST double, inv_dist_wght double

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

LINES terminated by '\n'

stored as textfile location '/user/bdanalytics/weather';

 

hive> describe weather.FP_MPE_GRID_SUB_IDW_RESULT_STEP1;

OK

station_id              string                  None

wx_date                 string                  None

high_tmp_f              double                  None

low_tmp_f               double                  None

tmp_f                   double                  None

rel_hum_pct             double                  None

wind_speed_mph          double                  None

highest_wind_gust_mph   double                  None

solar_rad_avg_global_wm2        double                  None

water_equiv_inch        double                  None

station_id2             string                  None

id                      int                     None

dist                    double                  None

inv_dist_wght           double                  None

Time taken: 0.201 seconds, Fetched: 14 row(s)

hive>

 

hive> select count(id) from weather.FP_MPE_GRID_SUB_IDW_RESULT_STEP1;

Total MapReduce jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

  set mapred.reduce.tasks=<number>

Starting Job = job_1397147177898_0001, Tracking URL = http://hadoopg1:8088/proxy/application_1397147177898_0001/

Kill Command = /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop/bin/hadoop job  -kill job_1397147177898_0001

Hadoop job information for Stage-1: number of mappers: 1548; number of reducers: 1

2014-04-10 16:43:50,346 Stage-1 map = 0%,  reduce = 0%

2014-04-10 16:44:04,361 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 49.42 sec

2014-04-10 16:44:05,519 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 125.63 sec

2014-04-10 16:44:06,580 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 144.91 sec

2014-04-10 16:44:07,673 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 179.89 sec

2014-04-10 16:44:08,733 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 213.98 sec

2014-04-10 16:44:09,791 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 245.45 sec

2014-04-10 16:44:10,852 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 240.17 sec

2014-04-10 16:44:11,905 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 241.82 sec

2014-04-10 16:44:13,009 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 241.82 sec

2014-04-10 16:44:14,091 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 250.38 sec

2014-04-10 16:44:15,151 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 255.41 sec

2014-04-10 16:44:16,235 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 277.35 sec

2014-04-10 16:44:17,370 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 301.2 sec

2014-04-10 16:44:18,411 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 314.53 sec

2014-04-10 16:44:19,472 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 329.22 sec

2014-04-10 16:44:20,545 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 356.46 sec

2014-04-10 16:44:21,604 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 387.29 sec

2014-04-10 16:44:22,705 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 397.92 sec

2014-04-10 16:44:23,752 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 399.31 sec

2014-04-10 16:44:24,806 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 407.96 sec

2014-04-10 16:44:25,861 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 411.52 sec

2014-04-10 16:44:26,934 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 411.52 sec

2014-04-10 16:44:28,000 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 415.28 sec

2014-04-10 16:44:29,085 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 433.5 sec

2014-04-10 16:44:30,166 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 434.77 sec

2014-04-10 16:44:31,291 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 443.1 sec

2014-04-10 16:44:32,371 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 462.61 sec

2014-04-10 16:44:33,430 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 487.98 sec

2014-04-10 16:44:34,502 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 497.86 sec

2014-04-10 16:44:35,549 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 502.88 sec

2014-04-10 16:44:36,634 Stage-1 map = 12%,  reduce = 0%, Cumulative CPU 510.7 sec

2014-04-10 16:44:37,670 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 507.2 sec

2014-04-10 16:44:38,706 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 507.2 sec

MapReduce Total cumulative CPU time: 8 minutes 27 seconds 200 msec

Ended Job = job_1397147177898_0001 with errors

Error during job, obtaining debugging information...

Examining task ID: task_1397147177898_0001_m_000016 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000005 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000033 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000025 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000068 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000002 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000034 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000097 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000089 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000127 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000107 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000098 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000030 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000118 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000109 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000125 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000087 (and more) from job job_1397147177898_0001

 

Task with the most failures(4):

-----

Task ID:

  task_1397147177898_0001_m_000030

 

URL:

  http://hadoopg1:8088/taskdetails.jsp?jobid=job_1397147177898_0001&tipid=task_1397147177898_0001_m_00...

-----

Diagnostic Messages for this Task:

Error: Java heap space

Container killed by the ApplicationMaster.

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

 

 

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

MapReduce Jobs Launched:

Job 0: Map: 1548  Reduce: 1   Cumulative CPU: 507.2 sec   HDFS Read: 39564410523 HDFS Write: 0 FAIL

Total MapReduce CPU Time Spent: 8 minutes 27 seconds 200 msec

hive>

 

 

Looking for advice on maybe specific tuning parameters working with this size of data and what may be commonly needed to let this query run. Did some Googling and tried a number of parms but nothing has yet had any change to the error or the percent I make it through before blowing up.

 

1 ACCEPTED SOLUTION

avatar
New Contributor

Researching some information about tuning YARN memory to fit the cluster size, I found the following settings to work for my configuration:

 

yarn.nodemanager.resource.memory-mb  =  20GB
yarn.scheduler.minimum-allocation-mb =   4GB
yarn.scheduler.maximum-allocation-mb =  20GB
mapreduce.map.memory.mb              =   4GB
mapreduce.reduce.memory.mb           =   8GB
mapreduce.map.java.opts              = 3.2GB
mapreduce.reduce.java.opts           = 6.4GB
yarn.app.mapreduce.am.resource.mb    =   8GB
yarn.app.mapreduce.am.command-opts   = 6.4GB

 

That allowed my particular Hive query to execute on our 10 node cluster with 30GB physical RAM each.

 

 

View solution in original post

1 REPLY 1

avatar
New Contributor

Researching some information about tuning YARN memory to fit the cluster size, I found the following settings to work for my configuration:

 

yarn.nodemanager.resource.memory-mb  =  20GB
yarn.scheduler.minimum-allocation-mb =   4GB
yarn.scheduler.maximum-allocation-mb =  20GB
mapreduce.map.memory.mb              =   4GB
mapreduce.reduce.memory.mb           =   8GB
mapreduce.map.java.opts              = 3.2GB
mapreduce.reduce.java.opts           = 6.4GB
yarn.app.mapreduce.am.resource.mb    =   8GB
yarn.app.mapreduce.am.command-opts   = 6.4GB

 

That allowed my particular Hive query to execute on our 10 node cluster with 30GB physical RAM each.