Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive Query not ending

Highlighted

Hive Query not ending

Contributor

Hi.

 

I've a hive querying running on Spark that never completes when aggregating more than X records on a Key column in a table stored as Parquet.

 

I tried wth few datasets :

 1 317 474 rows -> 1400 seconds (over() Partition by key #10000 )

 2 627 466 rows -> 1460seconds (over() Partition by key #20000 )

14 548 806 rows -> never ends up (over() Partition by key #30000 )

 

i.e: SELECT SUM(col1) OVER (PARTITION BY key ORDER BY num_col2) FROM table1;
 
Executors logs don't log anything after about 1 hour and the yarn application keeps running forever.

 

How to be sure that the Query still performs well ?