Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What size of tables make the best out of ORC format?

Solved Go to solution

What size of tables make the best out of ORC format?

Is there any particular size of the Hive table from where ORC table shows better performance compared to other types [especially text]? User is planning to have the default stripe size.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: What size of tables make the best out of ORC format?

Regarding table size, it can be tunned using stripe size, compress size and indexes, see this documentation: http://orc.apache.org/docs/hive-config.html

About performance, I believe ORC will have better performance than text files in most of the situations, but I would say Avro or SequenceFile will have a better performance for queries/use cases that needs full scans with all the columns (in tables with lots of columns). There might be an overhead for ORC to rebuild lines with lots and lots of columns.

1 REPLY 1

Re: What size of tables make the best out of ORC format?

Regarding table size, it can be tunned using stripe size, compress size and indexes, see this documentation: http://orc.apache.org/docs/hive-config.html

About performance, I believe ORC will have better performance than text files in most of the situations, but I would say Avro or SequenceFile will have a better performance for queries/use cases that needs full scans with all the columns (in tables with lots of columns). There might be an overhead for ORC to rebuild lines with lots and lots of columns.

Don't have an account?
Coming from Hortonworks? Activate your account here