Created 08-08-2016 01:24 AM
a = LOAD '601' using org.apache.hive.hcatalog.pig.HCatLoader();
b = LOAD '602' using org.apache.hive.hcatalog.pig.HCatLoader();
c = LOAD '603' using org.apache.hive.hcatalog.pig.HCatLoader();
d = LOAD 'SKL' using org.apache.hive.hcatalog.pig.HCatLoader();
e = join a by (d_key, c_cd ), b by (d_key, c_cd), c by (p1_key, c_cd), d by (p2_key, c_cd);
Dump e;
========================================================================
If I do the same joins in Hive, I get output. But In Pig, On dumping e, It runs MapReduce, read rows but doesn't write output after success. But If I do the same thing in hive by nested inner join, I get the correct output.
can anyone explain me the bug in PIG In joining different keys in relations?
Another thing if I want to load e with HCatlogue(HCatStore) into a partitioned blank table with dynamic partition (without value). I am getting partitioned table error. I don't know the reason of the error in Hcatalogue. Please explain me if you faced the same thing and provide me any solution.
Created 08-08-2016 03:23 PM
Actually, JOINs in Pig work about the same as they do in Hive. I wrote up a quick blog post at https://martin.atlassian.net/wiki/x/AgCfB based on your question and just made up some data since you didn't provide any. I'm sure you did this, but the easiest thing to do is simplify your join just working on two relations first then add the third and eventually the fourth if all is working. If "e" never gets populated, my hunch is that it is a data issue, not a Pig issue.
As for saving into a partitioned Hive table, my blog post shows an example of that working as well as points back to https://community.hortonworks.com/questions/2562/appending-to-existing-partition-with-pig.html to address the fact that (and strategies for) Pig not being able to write to an existing partition.
Good luck!
Created 08-08-2016 03:23 PM
Actually, JOINs in Pig work about the same as they do in Hive. I wrote up a quick blog post at https://martin.atlassian.net/wiki/x/AgCfB based on your question and just made up some data since you didn't provide any. I'm sure you did this, but the easiest thing to do is simplify your join just working on two relations first then add the third and eventually the fourth if all is working. If "e" never gets populated, my hunch is that it is a data issue, not a Pig issue.
As for saving into a partitioned Hive table, my blog post shows an example of that working as well as points back to https://community.hortonworks.com/questions/2562/appending-to-existing-partition-with-pig.html to address the fact that (and strategies for) Pig not being able to write to an existing partition.
Good luck!
Created 08-09-2016 03:16 AM
Thank you @Lester Martin. Your blog is wonderful. I am checking my dataset.
,Hi Martin,
Thank you for such a wonderful blog.
Created 08-09-2016 09:38 PM
You're more than welcome @Ashish Vishnoi. If it was helping, and it is appropriate, I'd sure appreciate you marking my response as "Best Answer" to help me build up my points. 😉
Created 08-09-2016 03:17 AM
Hi @Lester Martin thank you for wonderful blog.