Support Questions

Find answers, ask questions, and share your expertise

Faster and Better Optimized Storage format in Pig?

Expert Contributor

Is there any better storage format for pig? Lets say I want to store a very large filtered hive table/data before any further processing. Is there any format that makes processing faster?

1 ACCEPTED SOLUTION

Mentor
3 REPLIES 3

Mentor
@Adnan Alvee

use ORC format with HCatalog integration in Pig, take a look at my article https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-...

Expert Contributor

Wow. ORC got me from going 3TB(PigStorage) to 60 gb. This is insane. I didn't notice any performance improvement though. But I am happy with savings in storage. Thanks! 🙂

Mentor

@Adnan Alvee that is impressive indeed, ORC has additional benefits you will see on the Hive side. Glad you found it of use.