Support Questions

Find answers, ask questions, and share your expertise

hive on spark inner join never stop,but sparksql execute successull

New Contributor

hi,all.

I'm using cdh5.13.1.

i have two  parquet table: p_t_customer(3billion rows) and p_t_contract_product(4billion rows).

i use this simple join sql:

 

select count(*) from  p_t_customer c  inner join p_t_contract_product p on c.customer_id = p.insured_2;

but when i use hive on spark,it is not stop after hours,but when i use sparksql ,it executes succesfull in 15miutes.

 

here is hive on spark screenshot:

4.png1.png2.png3.png

2 REPLIES 2

New Contributor

it finallly begin to fail....

5.png

why this happen?is my hive configuration not right?i manuly close dynamic allocation to avoid timeout.

6.png

Contributor

Hello,

 

You should ideally be running with the dynamic allocation enabled.

 

Are you able to try the query again with a 16GB executor heap to test if this is some sort of memory inefficiency?  If so, what are the results?

 

 

set spark.executor.memory = 16g;

 

https://www.cloudera.com/documentation/enterprise/latest/topics/admin_hos_tuning.html

 

Thanks.