Created 04-13-2017 12:24 AM
I am doing Sort merge join using tez examples jar using Tez 0.7.1. The sample of two files are:-
ISBN;"Book-Title";"Book-Author";"Year-Of-Publication";"Publisher";"Image-URL-S";"Image-URL-M";"Image-URL-L" 0195153448;"Classical Mythology";"Mark P. O. Morford";"2002";"Oxford University Press";"http://images.amazon.com/images/P/0195153448.01.THUMBZZZ.jpg";"http://images.amazon.com/images/P/0195153448.01.MZZZZZZZ.jpg";"http://images.amazon.com/images/P/0195153448.01.LZZZZZZZ.jpg"
User-ID;"ISBN";"Book-Rating"
276725;"034545104X";"0"
First one has 300 thousand and second one has around 1 million records and the common attribute is ISBN of a book.
The DAG is getting completed successfully but there is no output. Even the logs look fine.
My understanding of SortMergeJoin is that it sorts both datasets on the join attribute and then looks for qualifying records by merging the two datasets. The sorting step groups all tuples with the same value in the join column together and thus makes it easy to identify partitions or groups of tuples with the same value in the join column. I am referring this link from Tez examples. Just wanted to confirm that how is it deciding the join attribute which in this case should be ISBN. PLease help.
Created 04-19-2017 12:15 PM
Want to get a detailed solution you have to login/registered on the community
Register/LoginCreated 04-19-2017 12:15 PM
Want to get a detailed solution you have to login/registered on the community
Register/LoginCreated 04-20-2017 06:11 AM
@gnovak Thanks a lot, I guess I missed that point. That has to be the reason why there is nothing in the output.