- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How important is to analyze ORC Hive table
- Labels:
-
Apache Hive
Created 03-27-2017 07:11 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
All,
If the hive tables created are in ORC and snappy ; how important is to analyze the tables and columns for performance?
also if table is in ORC, do we need to take care of other performance enhancement techniques like Sorted merge join ( keep data in sorted on keys ), CBO and others.
how does it applies to ORC files.
Created 03-27-2017 07:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Abhijeet Rajput, it is recommended to analyze all tables, ORC included, on a regular basis for performance. Statistics will be more valuable on larger tables than smaller tables. Sorting is not necessary and, in fact, sorting is not allowed on ACID tables. As of HDP 2.5, Hive uses both a rules based optimizer as well as a cost-based optimizer called Apache Calcite. Enabling the CBO will provide the best use of statistics.
Also, you may want to take a look at LLAP which is TP in 2.5 and will be GA in 2.6.
Hope this helps.
Created 03-27-2017 07:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Abhijeet Rajput, it is recommended to analyze all tables, ORC included, on a regular basis for performance. Statistics will be more valuable on larger tables than smaller tables. Sorting is not necessary and, in fact, sorting is not allowed on ACID tables. As of HDP 2.5, Hive uses both a rules based optimizer as well as a cost-based optimizer called Apache Calcite. Enabling the CBO will provide the best use of statistics.
Also, you may want to take a look at LLAP which is TP in 2.5 and will be GA in 2.6.
Hope this helps.
