Support Questions
Find answers, ask questions, and share your expertise

Hive Function for Multiple Rows as Input and Output

Hive Function for Multiple Rows as Input and Output

Champion Alumni

Hello,

I have a  data processing logic where in we need to remove the duplicates from  a dataset.There are 3 phases where we remove the  duplicates.The first two augments  quiet well in hive.In the last phase we have to filter out certain records bases on  some procedural code.What we need is a functionality that would take in multiple  rows as input and  return multiple output.The function would do teh duplicate checking  and return Ids of  those records.I  have looked generic udtf but not sure on whether its the right approach.Any pointers would be helpful.

 

Thanks,

Nishan