Support Questions

g_rao9704 · ‎06-08-2017

I am using hive with tez in HDP and i am running hive query,it was completed 99.27% and it is stuck in reducer phase, attached the screenshot.

hive-log.png

job-status.pn

mem-space.png

same job was done in 30min in last run, please suggest why it is happen and solution for this issue?

i have below memory space in linux :swap

total used free shared buffers cached

Mem: 258041 254721 3320 0 3449 237158

-/+ buffers/cache: 14114 243927 Swap: 49151 660 48491

Please suggest

sagar_girish · ‎06-10-2017

Hi @rama , First of all, I would suggest you to kill such jobs after 2-2.5 Hours, especially when your job finishes in half n hour on a normal day.

1 probable cause could be any other job is utilizing 90+% CPU, hence slowing down your job process.

If you can provide me your entire query, I may be able to provide you few set parameters which will help running the query faster.

Cheers, Sagar

g_rao9704 · ‎06-10-2017

Hi@Sagar Morakhia

currently no other job are running except this job,actually it was stuck at 97.22% and when i kill the job and rerun again it was stuck same position , last reducer was running forever.

up to table creation and insert the values into drome_master_5 it was good but while insert value into .drome_master_6 it was stuck.

below is step where the query stuck

insert into table dropme_master_6 select * from dropme_master_5 where dropme_master_5.consumer_sequence_id not in ( select consumer_sequence_id from dropme_master_6);

and while executing the this step it shows below warning message

Warning: Map Join MAPJOIN[30][bigTable=dropme_master_5] in task 'Map 4' is a cross product

i was split query and execute it like below and it gave result very fast

select count(consumer_sequence_id) from dropme_master_6;

result:10352059

select count(*) from dropme_master_5;

OK

21287539

i have added below parameter to the job but no use

set hive.execution.engine=mr;

set hive.vectorized.execution.enabled = true;

set hive.vectorized.execution.reduce.enabled = true;

set hive.tez.container.size=10240;

set hive.tez.java.opts=-Xmx9216m;

set mapreduce.map.memory.mb=8192;

set mapreduce.map.java.opts=-Xmx7372m;

set mapreduce.reduce.memory.mb=9216;

set mapreduce.reduce.java.opts=-Xmx8294m;

set yarn.scheduler.minimum-allocation-mb=1024;

set yarn.scheduler.maximun-allocation-mb=11264;

set hive.cbo.enable=true;

cluster-status.png

job-status.png

sagar_girish · ‎06-12-2017

Hi @rama ,

Please change your query to this:

insert into table dropme_master_6 
select * from dropme_master_5 a
left outer join dropme_master_6 b 
on a.consumer_sequence_id = b.consumer_sequence_id
where b.consumer_sequence_id is null;

I am pretty confident this will improve your performance.

Please let me know if it works and give me a thumps up. 🙂

Regards,

Sagar Morakhia

sagar_girish · ‎06-12-2017

Hi @rama

The below query will make your query run faster.

insert into table dropme_master_6 select * from dropme_master_5 a left outer join dropme_master_6 b on a.consumer_sequence_id = b.consumer_sequence_id where b.consumer_sequence_id is null;

g_rao9704 · ‎06-13-2017

Thanks you @Sagar Morakhia

I have try above query but no luck still it is running since 4 hours

Cloudera Community

Support Questions

Hi team , hive job taking so much time compare to last run(since 20hour it is running)same job last run it was completed on 30min ..Please help me to exit from this issue