Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Optimze query

Highlighted

Optimze query

Explorer

Is there a better way to write this query... considering bmillions of rows using spark

 

select *
from (SELECT *, row_number() over(PARTITION BY tran_id ORDER BY load_dt DESC) RN
FROM MySourceTable WHERE CAST(tradeDtae) as TIMESTAMP) BETWEEN add_months(current_timestamp(), -64) AND current_timestamp() AND sys_id = 'TRADING
) temp where temp.RN=1;

 

MySourceTable is partitioned by tradeDtae as int converted from timestamp

 

 

Don't have an account?
Coming from Hortonworks? Activate your account here