Support Questions

Find answers, ask questions, and share your expertise

when do you not use orc tables?

avatar
Expert Contributor

Hi all,

I have some large tables in our hadoop cluster which are in text format, i would like to change all to orc ... is there something i need to worry about if all tables are orc? in what circumstances you dont use orc?

Thanks.

1 ACCEPTED SOLUTION

avatar

Hi @PJ, the honest truth is there is no good reason not to use ORC format. You can use another format like Parquet but it won't provide ACID, LLAP cache, or the same level of performance. I would say the decision is similar to not using indexes in a relational system or not running statistics. ORC is simply best practice for high performance data warehousing in Hive.

Keep in mind that LLAP will allow you to cache raw text files. This may be an option if you have some strict SLA preventing you from incurring the conversion delay of the text file to ORC.

View solution in original post

1 REPLY 1

avatar

Hi @PJ, the honest truth is there is no good reason not to use ORC format. You can use another format like Parquet but it won't provide ACID, LLAP cache, or the same level of performance. I would say the decision is similar to not using indexes in a relational system or not running statistics. ORC is simply best practice for high performance data warehousing in Hive.

Keep in mind that LLAP will allow you to cache raw text files. This may be an option if you have some strict SLA preventing you from incurring the conversion delay of the text file to ORC.