Created 10-25-2015 10:51 PM
Hi All,
when i'm running hive query one 1 bug table and 1 small table (corss join) few map task are running very very slow, not sure why this happning
2015-10-25 21:48:40,760 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 2015-10-25 21:48:40,760 INFO org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 408023442; bufvoid = 510027376 2015-10-25 21:48:40,760 INFO org.apache.hadoop.mapred.MapTask: kvstart = 0; kvend = 246302; length = 1677721 2015-10-25 21:48:41,153 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.snappy] 2015-10-25 21:48:41,780 INFO org.apache.hadoop.mapred.MapTask: Finished spill 0 2015-10-25 21:48:45,013 INFO org.apache.hadoop.hive.ql.exec.mr.ExecMapper: ExecMapper: processing 10000 rows: used memory = 670643136 2015-10-25 21:48:51,742 INFO org.apache.hadoop.hive.ql.exec.mr.ExecMapper: ExecMapper: processing 100000 rows: used memory = 987411896 2015-10-25 21:48:54,389 INFO org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file hdfs://nameservice1/data/warehouse/weblogs.db/webtrends/log_date=2015-10-01/000009_0_copy_1 2015-10-25 21:49:02,632 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 2015-10-25 21:49:02,632 INFO org.apache.hadoop.mapred.MapTask: bufstart = 408023442; bufend = 306020529; bufvoid = 510027376 2015-10-25 21:49:02,632 INFO org.apache.hadoop.mapred.MapTask: kvstart = 246302; kvend = 482528; length = 1677721 2015-10-25 21:49:03,389 INFO org.apache.hadoop.mapred.MapTask: Finished spill 1
I see these lot of Finished spill <> messages
INFO org.apache.hadoop.mapred.MapTask: Finished spill
Can someone help on this?
Thanks,
Venu
Created 11-04-2015 02:36 PM
Hello Venu,
The spill messages and the log snippet indicate that the Hive's MapReduce task is using disk to sort data because the buffer allocated for sorting is full. There's couple of things that you can tune:
1. Increase the container memory allocated to Map tasks (remember to increase the heap size of the map task too!)
2. Increase the sort buffer size (mapreduce.task.io.sort.mb)
Hope this helps.
Created 11-06-2015 03:43 PM
Thank yuo, i will second option
@schuberth wrote:Hello Venu,
The spill messages and the log snippet indicate that the Hive's MapReduce task is using disk to sort data because the buffer allocated for sorting is full. There's couple of things that you can tune:
1. Increase the container memory allocated to Map tasks (remember to increase the heap size of the map task too!)
2. Increase the sort buffer size (mapreduce.task.io.sort.mb)
Hope this helps.
Created 11-06-2015 03:44 PM
Thankyou
@venu123 wrote:Thank yuo, i will second option
@schuberth wrote:Hello Venu,
The spill messages and the log snippet indicate that the Hive's MapReduce task is using disk to sort data because the buffer allocated for sorting is full. There's couple of things that you can tune:
1. Increase the container memory allocated to Map tasks (remember to increase the heap size of the map task too!)
2. Increase the sort buffer size (mapreduce.task.io.sort.mb)
Hope this helps.
i will try second option