Support Questions

vishnoi_k_ashis · ‎08-08-2016

a = LOAD '601' using org.apache.hive.hcatalog.pig.HCatLoader();

b = LOAD '602' using org.apache.hive.hcatalog.pig.HCatLoader();

c = LOAD '603' using org.apache.hive.hcatalog.pig.HCatLoader();

d = LOAD 'SKL' using org.apache.hive.hcatalog.pig.HCatLoader();

e = join a by (d_key, c_cd ), b by (d_key, c_cd), c by (p1_key, c_cd), d by (p2_key, c_cd);

Dump e;

========================================================================

If I do the same joins in Hive, I get output. But In Pig, On dumping e, It runs MapReduce, read rows but doesn't write output after success. But If I do the same thing in hive by nested inner join, I get the correct output.

can anyone explain me the bug in PIG In joining different keys in relations?

Another thing if I want to load e with HCatlogue(HCatStore) into a partitioned blank table with dynamic partition (without value). I am getting partitioned table error. I don't know the reason of the error in Hcatalogue. Please explain me if you faced the same thing and provide me any solution.

LesterMartin · ‎08-08-2016

Actually, JOINs in Pig work about the same as they do in Hive. I wrote up a quick blog post at https://martin.atlassian.net/wiki/x/AgCfB based on your question and just made up some data since you didn't provide any. I'm sure you did this, but the easiest thing to do is simplify your join just working on two relations first then add the third and eventually the fourth if all is working. If "e" never gets populated, my hunch is that it is a data issue, not a Pig issue.

As for saving into a partitioned Hive table, my blog post shows an example of that working as well as points back to https://community.hortonworks.com/questions/2562/appending-to-existing-partition-with-pig.html to address the fact that (and strategies for) Pig not being able to write to an existing partition.

Good luck!

View solution in original post

LesterMartin · ‎08-08-2016

Actually, JOINs in Pig work about the same as they do in Hive. I wrote up a quick blog post at https://martin.atlassian.net/wiki/x/AgCfB based on your question and just made up some data since you didn't provide any. I'm sure you did this, but the easiest thing to do is simplify your join just working on two relations first then add the third and eventually the fourth if all is working. If "e" never gets populated, my hunch is that it is a data issue, not a Pig issue.

As for saving into a partitioned Hive table, my blog post shows an example of that working as well as points back to https://community.hortonworks.com/questions/2562/appending-to-existing-partition-with-pig.html to address the fact that (and strategies for) Pig not being able to write to an existing partition.

Good luck!

vishnoi_k_ashis · ‎08-09-2016

Thank you @Lester Martin. Your blog is wonderful. I am checking my dataset.

,

Hi Martin,

Thank you for such a wonderful blog.

LesterMartin · ‎08-09-2016

You're more than welcome @Ashish Vishnoi. If it was helping, and it is appropriate, I'd sure appreciate you marking my response as "Best Answer" to help me build up my points. 😉

vishnoi_k_ashis · ‎08-09-2016

Hi @Lester Martin thank you for wonderful blog.

Cloudera Community

Support Questions

PIG inner join with different keys