Support Questions
Find answers, ask questions, and share your expertise

Hive query few map task stuck - Need help

Highlighted

Hive query few map task stuck - Need help

Explorer

Hi All,

 

when i'm running hive query one 1 bug table and 1 small table (corss join) few map task are running very very slow, not sure why this happning

 

2015-10-25 21:48:40,760 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true
2015-10-25 21:48:40,760 INFO org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 408023442; bufvoid = 510027376
2015-10-25 21:48:40,760 INFO org.apache.hadoop.mapred.MapTask: kvstart = 0; kvend = 246302; length = 1677721
2015-10-25 21:48:41,153 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.snappy]
2015-10-25 21:48:41,780 INFO org.apache.hadoop.mapred.MapTask: Finished spill 0
2015-10-25 21:48:45,013 INFO org.apache.hadoop.hive.ql.exec.mr.ExecMapper: ExecMapper: processing 10000 rows: used memory = 670643136
2015-10-25 21:48:51,742 INFO org.apache.hadoop.hive.ql.exec.mr.ExecMapper: ExecMapper: processing 100000 rows: used memory = 987411896
2015-10-25 21:48:54,389 INFO org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://nameservice1/data/warehouse/weblogs.db/webtrends/log_date=2015-10-01/000009_0_copy_1
2015-10-25 21:49:02,632 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true
2015-10-25 21:49:02,632 INFO org.apache.hadoop.mapred.MapTask: bufstart = 408023442; bufend = 306020529; bufvoid = 510027376
2015-10-25 21:49:02,632 INFO org.apache.hadoop.mapred.MapTask: kvstart = 246302; kvend = 482528; length = 1677721
2015-10-25 21:49:03,389 INFO org.apache.hadoop.mapred.MapTask: Finished spill 1

I see these lot of Finished spill <> messages

INFO org.apache.hadoop.mapred.MapTask: Finished spill

Can someone help on this?

 

Thanks,

Venu

3 REPLIES 3
Highlighted

Re: Hive query few map task stuck - Need help

Cloudera Employee

Hello Venu,

 

The spill messages and the log snippet indicate that the Hive's MapReduce task is using disk to sort data because the buffer allocated for sorting is full. There's couple of things that you can tune:

 

1. Increase the container memory allocated to Map tasks (remember to increase the heap size of the map task too!)

2. Increase the sort buffer size (mapreduce.task.io.sort.mb)

 

Hope this helps.

Highlighted

Re: Hive query few map task stuck - Need help

Explorer

Thank yuo, i will second option


@schuberth wrote:

Hello Venu,

 

The spill messages and the log snippet indicate that the Hive's MapReduce task is using disk to sort data because the buffer allocated for sorting is full. There's couple of things that you can tune:

 

1. Increase the container memory allocated to Map tasks (remember to increase the heap size of the map task too!)

2. Increase the sort buffer size (mapreduce.task.io.sort.mb)

 

Hope this helps.


 

Highlighted

Re: Hive query few map task stuck - Need help

Explorer

Thankyou


@venu123 wrote:

Thank yuo, i will second option


@schuberth wrote:

Hello Venu,

 

The spill messages and the log snippet indicate that the Hive's MapReduce task is using disk to sort data because the buffer allocated for sorting is full. There's couple of things that you can tune:

 

1. Increase the container memory allocated to Map tasks (remember to increase the heap size of the map task too!)

2. Increase the sort buffer size (mapreduce.task.io.sort.mb)

 

Hope this helps.


 



i will try second option