- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
What size of tables make the best out of ORC format?
- Labels:
-
Apache Hive
Created ‎11-09-2015 03:02 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is there any particular size of the Hive table from where ORC table shows better performance compared to other types [especially text]? User is planning to have the default stripe size.
Created ‎11-09-2015 01:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Regarding table size, it can be tunned using stripe size, compress size and indexes, see this documentation: http://orc.apache.org/docs/hive-config.html
About performance, I believe ORC will have better performance than text files in most of the situations, but I would say Avro or SequenceFile will have a better performance for queries/use cases that needs full scans with all the columns (in tables with lots of columns). There might be an overhead for ORC to rebuild lines with lots and lots of columns.
Created ‎11-09-2015 01:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Regarding table size, it can be tunned using stripe size, compress size and indexes, see this documentation: http://orc.apache.org/docs/hive-config.html
About performance, I believe ORC will have better performance than text files in most of the situations, but I would say Avro or SequenceFile will have a better performance for queries/use cases that needs full scans with all the columns (in tables with lots of columns). There might be an overhead for ORC to rebuild lines with lots and lots of columns.
