Support Questions
Find answers, ask questions, and share your expertise

UDF python gives different answer in pig

New Contributor

I want to write a UDF python for pig, to read lines from the file called like




and match the names and if finds any matches, then replaces it with white space. here is my python code


def list_files2(name, f):
       fin = open(f, 'r')
       for line in fin:
            final = name
            extra = 'nothing'
            if (name != name.replace(line.strip(), ' ')):
                 extra = line.strip()
                 final = name.replace(line.strip(), ' ').strip()
                 return final, extra,'insdie if'
     return final, extra, 'inside for'



Running this code in python,

>print list_files2('LLC nakisa', 'prefix.csv' )
>print list_files2('AG company', 'prefix.csv' )



('nakisa', 'LLC', 'insdie if')
('AG company', 'nothing', 'inside for')

which is exactly what I need. But when I register this code as a UDF in apache pig for this sample list:



nakisa company LLC
three Oy
AG Lans
Test OOD


pig returns wrong answer on the third line:



((nakisa company,LLC,insdie if))
((three,Oy,insdie if))
((A G L a n s,,insdie if))
((Test,OOD,insdie if))



The question is why UDF enters the if loop for the third entry which does not have any match in the prefix.csv file?