Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What size of tables make the best out of ORC format?

avatar

Is there any particular size of the Hive table from where ORC table shows better performance compared to other types [especially text]? User is planning to have the default stripe size.

1 ACCEPTED SOLUTION

avatar

Regarding table size, it can be tunned using stripe size, compress size and indexes, see this documentation: http://orc.apache.org/docs/hive-config.html

About performance, I believe ORC will have better performance than text files in most of the situations, but I would say Avro or SequenceFile will have a better performance for queries/use cases that needs full scans with all the columns (in tables with lots of columns). There might be an overhead for ORC to rebuild lines with lots and lots of columns.

View solution in original post

1 REPLY 1

avatar

Regarding table size, it can be tunned using stripe size, compress size and indexes, see this documentation: http://orc.apache.org/docs/hive-config.html

About performance, I believe ORC will have better performance than text files in most of the situations, but I would say Avro or SequenceFile will have a better performance for queries/use cases that needs full scans with all the columns (in tables with lots of columns). There might be an overhead for ORC to rebuild lines with lots and lots of columns.