Support Questions

Find answers, ask questions, and share your expertise

Faster and Better Optimized Storage format in Pig?

avatar
Expert Contributor

Is there any better storage format for pig? Lets say I want to store a very large filtered hive table/data before any further processing. Is there any format that makes processing faster?

1 ACCEPTED SOLUTION

avatar
Master Mentor
3 REPLIES 3

avatar
Master Mentor
@Adnan Alvee

use ORC format with HCatalog integration in Pig, take a look at my article https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-...

avatar
Expert Contributor

Wow. ORC got me from going 3TB(PigStorage) to 60 gb. This is insane. I didn't notice any performance improvement though. But I am happy with savings in storage. Thanks! 🙂

avatar
Master Mentor

@Adnan Alvee that is impressive indeed, ORC has additional benefits you will see on the Hive side. Glad you found it of use.