Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive Function for Multiple Rows as Input and Output

Hive Function for Multiple Rows as Input and Output

Champion Alumni

Hello,

I have a  data processing logic where in we need to remove the duplicates from  a dataset.There are 3 phases where we remove the  duplicates.The first two augments  quiet well in hive.In the last phase we have to filter out certain records bases on  some procedural code.What we need is a functionality that would take in multiple  rows as input and  return multiple output.The function would do teh duplicate checking  and return Ids of  those records.I  have looked generic udtf but not sure on whether its the right approach.Any pointers would be helpful.

 

Thanks,

Nishan

Don't have an account?
Coming from Hortonworks? Activate your account here