Created on 02-06-2018 04:50 PM - edited 09-16-2022 05:49 AM
Hi all,
I have some large tables in our hadoop cluster which are in text format, i would like to change all to orc ... is there something i need to worry about if all tables are orc? in what circumstances you dont use orc?
Thanks.
Created 02-06-2018 06:26 PM
Hi @PJ, the honest truth is there is no good reason not to use ORC format. You can use another format like Parquet but it won't provide ACID, LLAP cache, or the same level of performance. I would say the decision is similar to not using indexes in a relational system or not running statistics. ORC is simply best practice for high performance data warehousing in Hive.
Keep in mind that LLAP will allow you to cache raw text files. This may be an option if you have some strict SLA preventing you from incurring the conversion delay of the text file to ORC.
Created 02-06-2018 06:26 PM
Hi @PJ, the honest truth is there is no good reason not to use ORC format. You can use another format like Parquet but it won't provide ACID, LLAP cache, or the same level of performance. I would say the decision is similar to not using indexes in a relational system or not running statistics. ORC is simply best practice for high performance data warehousing in Hive.
Keep in mind that LLAP will allow you to cache raw text files. This may be an option if you have some strict SLA preventing you from incurring the conversion delay of the text file to ORC.
 
					
				
				
			
		
