- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
when do you not use orc tables?
- Labels:
-
Apache Hadoop
Created on ‎02-06-2018 04:50 PM - edited ‎09-16-2022 05:49 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I have some large tables in our hadoop cluster which are in text format, i would like to change all to orc ... is there something i need to worry about if all tables are orc? in what circumstances you dont use orc?
Thanks.
Created ‎02-06-2018 06:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @PJ, the honest truth is there is no good reason not to use ORC format. You can use another format like Parquet but it won't provide ACID, LLAP cache, or the same level of performance. I would say the decision is similar to not using indexes in a relational system or not running statistics. ORC is simply best practice for high performance data warehousing in Hive.
Keep in mind that LLAP will allow you to cache raw text files. This may be an option if you have some strict SLA preventing you from incurring the conversion delay of the text file to ORC.
Created ‎02-06-2018 06:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @PJ, the honest truth is there is no good reason not to use ORC format. You can use another format like Parquet but it won't provide ACID, LLAP cache, or the same level of performance. I would say the decision is similar to not using indexes in a relational system or not running statistics. ORC is simply best practice for high performance data warehousing in Hive.
Keep in mind that LLAP will allow you to cache raw text files. This may be an option if you have some strict SLA preventing you from incurring the conversion delay of the text file to ORC.
