08-23-2016 12:03 AM - edited 08-23-2016 12:08 AM
I want to write a UDF python for pig, to read lines from the file called like
and match the names and if finds any matches, then replaces it with white space. here is my python code
def list_files2(name, f):
fin = open(f, 'r')
for line in fin:
final = name
extra = 'nothing'
if (name != name.replace(line.strip(), ' ')):
extra = line.strip()
final = name.replace(line.strip(), ' ').strip()
return final, extra,'insdie if'
return final, extra, 'inside for'
Running this code in python,
>print list_files2('LLC nakisa', 'prefix.csv' )
>print list_files2('AG company', 'prefix.csv' )
('nakisa', 'LLC', 'insdie if')
('AG company', 'nothing', 'inside for')
which is exactly what I need. But when I register this code as a UDF in apache pig for this sample list:
nakisa company LLC
pig returns wrong answer on the third line:
((nakisa company,LLC,insdie if))
((A G L a n s,,insdie if))
The question is why UDF enters the if loop for the third entry which does not have any match in the prefix.csv file?