Support Questions

Find answers, ask questions, and share your expertise

Hi team , hive job taking so much time compare to last run(since 20hour it is running)same job last run it was completed on 30min ..Please help me to exit from this issue

avatar
Expert Contributor

job-status.pngHi team,

I am using hive with tez in HDP and i am running hive query,it was completed 99.27% and it is stuck in reducer phase, attached the screenshot.

hive-log.png

job-status.pn

mem-space.png

same job was done in 30min in last run, please suggest why it is happen and solution for this issue?

i have below memory space in linux :swap

total used free shared buffers cached

Mem: 258041 254721 3320 0 3449 237158

-/+ buffers/cache: 14114 243927 Swap: 49151 660 48491

Please suggest

5 REPLIES 5

avatar
Rising Star

Hi @rama , First of all, I would suggest you to kill such jobs after 2-2.5 Hours, especially when your job finishes in half n hour on a normal day.

1 probable cause could be any other job is utilizing 90+% CPU, hence slowing down your job process.

If you can provide me your entire query, I may be able to provide you few set parameters which will help running the query faster.

Cheers, Sagar

avatar
Expert Contributor

Hi@Sagar Morakhia

currently no other job are running except this job,actually it was stuck at 97.22% and when i kill the job and rerun again it was stuck same position , last reducer was running forever.

up to table creation and insert the values into drome_master_5 it was good but while insert value into .drome_master_6 it was stuck.

below is step where the query stuck

insert into table dropme_master_6 select * from dropme_master_5 where dropme_master_5.consumer_sequence_id not in ( select consumer_sequence_id from dropme_master_6);

and while executing the this step it shows below warning message

Warning: Map Join MAPJOIN[30][bigTable=dropme_master_5] in task 'Map 4' is a cross product

i was split query and execute it like below and it gave result very fast

select count(consumer_sequence_id) from dropme_master_6;

result:10352059

select count(*) from dropme_master_5;

OK

21287539

i have added below parameter to the job but no use

set hive.execution.engine=mr;

set hive.vectorized.execution.enabled = true;

set hive.vectorized.execution.reduce.enabled = true;

set hive.tez.container.size=10240;

set hive.tez.java.opts=-Xmx9216m;

set mapreduce.map.memory.mb=8192;

set mapreduce.map.java.opts=-Xmx7372m;

set mapreduce.reduce.memory.mb=9216;

set mapreduce.reduce.java.opts=-Xmx8294m;

set yarn.scheduler.minimum-allocation-mb=1024;

set yarn.scheduler.maximun-allocation-mb=11264;

set hive.cbo.enable=true;

cluster-status.png

job-status.png

avatar
Rising Star

Hi @rama ,

Please change your query to this:

insert into table dropme_master_6 
select * from dropme_master_5 a
left outer join dropme_master_6 b 
on a.consumer_sequence_id = b.consumer_sequence_id
where b.consumer_sequence_id is null;

I am pretty confident this will improve your performance.

Please let me know if it works and give me a thumps up. 🙂

Regards,

Sagar Morakhia

avatar
Rising Star

Hi @rama

The below query will make your query run faster.

insert into table dropme_master_6 select * from dropme_master_5 a left outer join dropme_master_6 b on a.consumer_sequence_id = b.consumer_sequence_id where b.consumer_sequence_id is null;

avatar
Expert Contributor

Thanks you @Sagar Morakhia

I have try above query but no luck still it is running since 4 hours