Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How important is to analyze ORC Hive table

avatar
Contributor

All,

If the hive tables created are in ORC and snappy ; how important is to analyze the tables and columns for performance?

also if table is in ORC, do we need to take care of other performance enhancement techniques like Sorted merge join ( keep data in sorted on keys ), CBO and others.

how does it applies to ORC files.

1 ACCEPTED SOLUTION

avatar

Hi @Abhijeet Rajput, it is recommended to analyze all tables, ORC included, on a regular basis for performance. Statistics will be more valuable on larger tables than smaller tables. Sorting is not necessary and, in fact, sorting is not allowed on ACID tables. As of HDP 2.5, Hive uses both a rules based optimizer as well as a cost-based optimizer called Apache Calcite. Enabling the CBO will provide the best use of statistics.

Also, you may want to take a look at LLAP which is TP in 2.5 and will be GA in 2.6.

Hope this helps.

View solution in original post

1 REPLY 1

avatar

Hi @Abhijeet Rajput, it is recommended to analyze all tables, ORC included, on a regular basis for performance. Statistics will be more valuable on larger tables than smaller tables. Sorting is not necessary and, in fact, sorting is not allowed on ACID tables. As of HDP 2.5, Hive uses both a rules based optimizer as well as a cost-based optimizer called Apache Calcite. Enabling the CBO will provide the best use of statistics.

Also, you may want to take a look at LLAP which is TP in 2.5 and will be GA in 2.6.

Hope this helps.