Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive Query not ending


Hive Query not ending




I've a hive querying running on Spark that never completes when aggregating more than X records on a Key column in a table stored as Parquet.


I tried wth few datasets :

 1 317 474 rows -> 1400 seconds (over() Partition by key #10000 )

 2 627 466 rows -> 1460seconds (over() Partition by key #20000 )

14 548 806 rows -> never ends up (over() Partition by key #30000 )


i.e: SELECT SUM(col1) OVER (PARTITION BY key ORDER BY num_col2) FROM table1;
Executors logs don't log anything after about 1 hour and the yarn application keeps running forever.


How to be sure that the Query still performs well ?