Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HIVE - View with 'filters and row_num function' causing over utilization of resources and performance degradation

HIVE - View with 'filters and row_num function' causing over utilization of resources and performance degradation

We have a partitioned table of 50+ columns and 30+ million records. The task is to filter 200 latest records out of it based on the conditions matching on 3 columns. We have created a view and applied row_number function PARTITION BY those 3 columns ORDER BY load_date. Now while querying this, it is causing performance degradation and over-utilization of resources. Is there anything that I can try from query optimization end to improve the performance?

QUERY

SELECT * FROM view_name

WHERE data_as_of_date=20191213 and rule_row_num<=200

and rule_id='some_id' and run_id='some_run_id';

 

VIEW creation:

CREATE VIEW view_name as SELECT *, ROW_NUMBER() OVER (PARTITION BY data_as_of_date, rule_id, run_id ORDER BY load_date DESC) as rule_row_num from table_name;

Don't have an account?
Coming from Hortonworks? Activate your account here