Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

UDF python gives different answer in pig

UDF python gives different answer in pig

New Contributor

I want to write a UDF python for pig, to read lines from the file called like

 

  #'prefix.csv'
    spol.
    LLC
   Oy
   OOD

 

and match the names and if finds any matches, then replaces it with white space. here is my python code

 

def list_files2(name, f):
       fin = open(f, 'r')
       for line in fin:
            final = name
            extra = 'nothing'
            if (name != name.replace(line.strip(), ' ')):
                 extra = line.strip()
                 final = name.replace(line.strip(), ' ').strip()
                 return final, extra,'insdie if'
     return final, extra, 'inside for'

 

 

Running this code in python,

>print list_files2('LLC nakisa', 'prefix.csv' )
>print list_files2('AG company', 'prefix.csv' )


returns

 

('nakisa', 'LLC', 'insdie if')
('AG company', 'nothing', 'inside for')


which is exactly what I need. But when I register this code as a UDF in apache pig for this sample list:

 

 

nakisa company LLC
three Oy
AG Lans
Test OOD

 

pig returns wrong answer on the third line:

 

 

((nakisa company,LLC,insdie if))
((three,Oy,insdie if))
((A G L a n s,,insdie if))
((Test,OOD,insdie if))

 

 

The question is why UDF enters the if loop for the third entry which does not have any match in the prefix.csv file?