Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

hive on spark inner join never stop,but sparksql execute successull

hive on spark inner join never stop,but sparksql execute successull

New Contributor

hi,all.

I'm using cdh5.13.1.

i have two  parquet table: p_t_customer(3billion rows) and p_t_contract_product(4billion rows).

i use this simple join sql:

 

select count(*) from  p_t_customer c  inner join p_t_contract_product p on c.customer_id = p.insured_2;

but when i use hive on spark,it is not stop after hours,but when i use sparksql ,it executes succesfull in 15miutes.

 

here is hive on spark screenshot:

4.png1.png2.png3.png

2 REPLIES 2

Re: hive on spark inner join never stop,but sparksql execute successull

New Contributor

it finallly begin to fail....

5.png

why this happen?is my hive configuration not right?i manuly close dynamic allocation to avoid timeout.

6.png

Highlighted

Re: hive on spark inner join never stop,but sparksql execute successull

Contributor

Hello,

 

You should ideally be running with the dynamic allocation enabled.

 

Are you able to try the query again with a 16GB executor heap to test if this is some sort of memory inefficiency?  If so, what are the results?

 

 

set spark.executor.memory = 16g;

 

https://www.cloudera.com/documentation/enterprise/latest/topics/admin_hos_tuning.html

 

Thanks.