Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Faster and Better Optimized Storage format in Pig?

avatar
Expert Contributor

Is there any better storage format for pig? Lets say I want to store a very large filtered hive table/data before any further processing. Is there any format that makes processing faster?

1 ACCEPTED SOLUTION

avatar
Master Mentor
3 REPLIES 3

avatar
Master Mentor
@Adnan Alvee

use ORC format with HCatalog integration in Pig, take a look at my article https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-...

avatar
Expert Contributor

Wow. ORC got me from going 3TB(PigStorage) to 60 gb. This is insane. I didn't notice any performance improvement though. But I am happy with savings in storage. Thanks! 🙂

avatar
Master Mentor

@Adnan Alvee that is impressive indeed, ORC has additional benefits you will see on the Hive side. Glad you found it of use.