Reply
New Contributor
Posts: 2
Registered: ‎04-10-2014
Accepted Solution

Hive Java heap error running query (exit code 143)

I am doing some testing on a 10 node cluster (30GB memory each) using CDH5. I uploaded about 400GB of weather data across around 500 files, totaling about 4 billion lines of data into my HDFS. I'm trying to use Hive against this and just get a record count.

 

CREATE EXTERNAL TABLE weather.FP_MPE_GRID_SUB_IDW_RESULT_STEP1

( STATION_ID STRING,     WX_DATE STRING,           

HIGH_TMP_F DOUBLE,     LOW_TMP_F DOUBLE,

TMP_F DOUBLE,     REL_HUM_PCT DOUBLE,

WIND_SPEED_MPH DOUBLE,                 HIGHEST_WIND_GUST_MPH DOUBLE,          

SOLAR_RAD_AVG_GLOBAL_WM2 DOUBLE,                        WATER_EQUIV_INCH DOUBLE,

STATION_ID2 STRING, ID INT, DIST double, inv_dist_wght double

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

LINES terminated by '\n'

stored as textfile location '/user/bdanalytics/weather';

 

hive> describe weather.FP_MPE_GRID_SUB_IDW_RESULT_STEP1;

OK

station_id              string                  None

wx_date                 string                  None

high_tmp_f              double                  None

low_tmp_f               double                  None

tmp_f                   double                  None

rel_hum_pct             double                  None

wind_speed_mph          double                  None

highest_wind_gust_mph   double                  None

solar_rad_avg_global_wm2        double                  None

water_equiv_inch        double                  None

station_id2             string                  None

id                      int                     None

dist                    double                  None

inv_dist_wght           double                  None

Time taken: 0.201 seconds, Fetched: 14 row(s)

hive>

 

hive> select count(id) from weather.FP_MPE_GRID_SUB_IDW_RESULT_STEP1;

Total MapReduce jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

  set mapred.reduce.tasks=<number>

Starting Job = job_1397147177898_0001, Tracking URL = http://hadoopg1:8088/proxy/application_1397147177898_0001/

Kill Command = /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hadoop/bin/hadoop job  -kill job_1397147177898_0001

Hadoop job information for Stage-1: number of mappers: 1548; number of reducers: 1

2014-04-10 16:43:50,346 Stage-1 map = 0%,  reduce = 0%

2014-04-10 16:44:04,361 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 49.42 sec

2014-04-10 16:44:05,519 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 125.63 sec

2014-04-10 16:44:06,580 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 144.91 sec

2014-04-10 16:44:07,673 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 179.89 sec

2014-04-10 16:44:08,733 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 213.98 sec

2014-04-10 16:44:09,791 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 245.45 sec

2014-04-10 16:44:10,852 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 240.17 sec

2014-04-10 16:44:11,905 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 241.82 sec

2014-04-10 16:44:13,009 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 241.82 sec

2014-04-10 16:44:14,091 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 250.38 sec

2014-04-10 16:44:15,151 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 255.41 sec

2014-04-10 16:44:16,235 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 277.35 sec

2014-04-10 16:44:17,370 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 301.2 sec

2014-04-10 16:44:18,411 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 314.53 sec

2014-04-10 16:44:19,472 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 329.22 sec

2014-04-10 16:44:20,545 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 356.46 sec

2014-04-10 16:44:21,604 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 387.29 sec

2014-04-10 16:44:22,705 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 397.92 sec

2014-04-10 16:44:23,752 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 399.31 sec

2014-04-10 16:44:24,806 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 407.96 sec

2014-04-10 16:44:25,861 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 411.52 sec

2014-04-10 16:44:26,934 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 411.52 sec

2014-04-10 16:44:28,000 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 415.28 sec

2014-04-10 16:44:29,085 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 433.5 sec

2014-04-10 16:44:30,166 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 434.77 sec

2014-04-10 16:44:31,291 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 443.1 sec

2014-04-10 16:44:32,371 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 462.61 sec

2014-04-10 16:44:33,430 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 487.98 sec

2014-04-10 16:44:34,502 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 497.86 sec

2014-04-10 16:44:35,549 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 502.88 sec

2014-04-10 16:44:36,634 Stage-1 map = 12%,  reduce = 0%, Cumulative CPU 510.7 sec

2014-04-10 16:44:37,670 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 507.2 sec

2014-04-10 16:44:38,706 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 507.2 sec

MapReduce Total cumulative CPU time: 8 minutes 27 seconds 200 msec

Ended Job = job_1397147177898_0001 with errors

Error during job, obtaining debugging information...

Examining task ID: task_1397147177898_0001_m_000016 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000005 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000033 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000025 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000068 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000002 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000034 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000097 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000089 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000127 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000107 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000098 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000030 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000118 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000109 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000125 (and more) from job job_1397147177898_0001

Examining task ID: task_1397147177898_0001_m_000087 (and more) from job job_1397147177898_0001

 

Task with the most failures(4):

-----

Task ID:

  task_1397147177898_0001_m_000030

 

URL:

  http://hadoopg1:8088/taskdetails.jsp?jobid=job_1397147177898_0001&tipid=task_1397147177898_0001_m_000030

-----

Diagnostic Messages for this Task:

Error: Java heap space

Container killed by the ApplicationMaster.

Container killed on request. Exit code is 143

Container exited with a non-zero exit code 143

 

 

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

MapReduce Jobs Launched:

Job 0: Map: 1548  Reduce: 1   Cumulative CPU: 507.2 sec   HDFS Read: 39564410523 HDFS Write: 0 FAIL

Total MapReduce CPU Time Spent: 8 minutes 27 seconds 200 msec

hive>

 

 

Looking for advice on maybe specific tuning parameters working with this size of data and what may be commonly needed to let this query run. Did some Googling and tried a number of parms but nothing has yet had any change to the error or the percent I make it through before blowing up.

 

New Contributor
Posts: 2
Registered: ‎04-10-2014

Re: Hive Java heap error running query (exit code 143)

Researching some information about tuning YARN memory to fit the cluster size, I found the following settings to work for my configuration:

 

yarn.nodemanager.resource.memory-mb  =  20GB
yarn.scheduler.minimum-allocation-mb =   4GB
yarn.scheduler.maximum-allocation-mb =  20GB
mapreduce.map.memory.mb              =   4GB
mapreduce.reduce.memory.mb           =   8GB
mapreduce.map.java.opts              = 3.2GB
mapreduce.reduce.java.opts           = 6.4GB
yarn.app.mapreduce.am.resource.mb    =   8GB
yarn.app.mapreduce.am.command-opts   = 6.4GB

 

That allowed my particular Hive query to execute on our 10 node cluster with 30GB physical RAM each.

 

 

Announcements