Created 02-28-2017 04:30 PM
Is there any better storage format for pig? Lets say I want to store a very large filtered hive table/data before any further processing. Is there any format that makes processing faster?
Created 02-28-2017 04:36 PM
use ORC format with HCatalog integration in Pig, take a look at my article https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-...
Created 02-28-2017 04:36 PM
use ORC format with HCatalog integration in Pig, take a look at my article https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-...
Created 03-09-2017 04:28 PM
Wow. ORC got me from going 3TB(PigStorage) to 60 gb. This is insane. I didn't notice any performance improvement though. But I am happy with savings in storage. Thanks! 🙂
Created 03-09-2017 04:36 PM
@Adnan Alvee that is impressive indeed, ORC has additional benefits you will see on the Hive side. Glad you found it of use.