Reply
New Contributor
Posts: 2
Registered: ‎05-12-2018

hive on spark inner join never stop,but sparksql execute successull

hi,all.

I'm using cdh5.13.1.

i have two  parquet table: p_t_customer(3billion rows) and p_t_contract_product(4billion rows).

i use this simple join sql:

 

select count(*) from  p_t_customer c  inner join p_t_contract_product p on c.customer_id = p.insured_2;

but when i use hive on spark,it is not stop after hours,but when i use sparksql ,it executes succesfull in 15miutes.

 

here is hive on spark screenshot:

4.png1.png2.png3.png

New Contributor
Posts: 2
Registered: ‎05-12-2018

Re: hive on spark inner join never stop,but sparksql execute successull

it finallly begin to fail....

5.png

why this happen?is my hive configuration not right?i manuly close dynamic allocation to avoid timeout.

6.png

Highlighted
Cloudera Employee
Posts: 60
Registered: ‎11-20-2015

Re: hive on spark inner join never stop,but sparksql execute successull

Hello,

 

You should ideally be running with the dynamic allocation enabled.

 

Are you able to try the query again with a 16GB executor heap to test if this is some sort of memory inefficiency?  If so, what are the results?

 

 

set spark.executor.memory = 16g;

 

https://www.cloudera.com/documentation/enterprise/latest/topics/admin_hos_tuning.html

 

Thanks.

Announcements