Support Questions
Find answers, ask questions, and share your expertise

PIG inner join with different keys

a = LOAD '601' using org.apache.hive.hcatalog.pig.HCatLoader();

b = LOAD '602' using org.apache.hive.hcatalog.pig.HCatLoader();

c = LOAD '603' using org.apache.hive.hcatalog.pig.HCatLoader();

d = LOAD 'SKL' using org.apache.hive.hcatalog.pig.HCatLoader();

e = join a by (d_key, c_cd ), b by (d_key, c_cd), c by (p1_key, c_cd), d by (p2_key, c_cd);

Dump e;

========================================================================

If I do the same joins in Hive, I get output. But In Pig, On dumping e, It runs MapReduce, read rows but doesn't write output after success. But If I do the same thing in hive by nested inner join, I get the correct output.

can anyone explain me the bug in PIG In joining different keys in relations?

Another thing if I want to load e with HCatlogue(HCatStore) into a partitioned blank table with dynamic partition (without value). I am getting partitioned table error. I don't know the reason of the error in Hcatalogue. Please explain me if you faced the same thing and provide me any solution.

1 ACCEPTED SOLUTION

Actually, JOINs in Pig work about the same as they do in Hive. I wrote up a quick blog post at https://martin.atlassian.net/wiki/x/AgCfB based on your question and just made up some data since you didn't provide any. I'm sure you did this, but the easiest thing to do is simplify your join just working on two relations first then add the third and eventually the fourth if all is working. If "e" never gets populated, my hunch is that it is a data issue, not a Pig issue.

As for saving into a partitioned Hive table, my blog post shows an example of that working as well as points back to https://community.hortonworks.com/questions/2562/appending-to-existing-partition-with-pig.html to address the fact that (and strategies for) Pig not being able to write to an existing partition.

Good luck!

View solution in original post

4 REPLIES 4

Actually, JOINs in Pig work about the same as they do in Hive. I wrote up a quick blog post at https://martin.atlassian.net/wiki/x/AgCfB based on your question and just made up some data since you didn't provide any. I'm sure you did this, but the easiest thing to do is simplify your join just working on two relations first then add the third and eventually the fourth if all is working. If "e" never gets populated, my hunch is that it is a data issue, not a Pig issue.

As for saving into a partitioned Hive table, my blog post shows an example of that working as well as points back to https://community.hortonworks.com/questions/2562/appending-to-existing-partition-with-pig.html to address the fact that (and strategies for) Pig not being able to write to an existing partition.

Good luck!

Thank you @Lester Martin. Your blog is wonderful. I am checking my dataset.

,

Hi Martin,

Thank you for such a wonderful blog.

You're more than welcome @Ashish Vishnoi. If it was helping, and it is appropriate, I'd sure appreciate you marking my response as "Best Answer" to help me build up my points. 😉

Hi @Lester Martin thank you for wonderful blog.

; ;