Reply
New Contributor
Posts: 3
Registered: ‎07-01-2016

UDF python gives different answer in pig

[ Edited ]

I want to write a UDF python for pig, to read lines from the file called like

 

  #'prefix.csv'
    spol.
    LLC
   Oy
   OOD

 

and match the names and if finds any matches, then replaces it with white space. here is my python code

 

def list_files2(name, f):
       fin = open(f, 'r')
       for line in fin:
            final = name
            extra = 'nothing'
            if (name != name.replace(line.strip(), ' ')):
                 extra = line.strip()
                 final = name.replace(line.strip(), ' ').strip()
                 return final, extra,'insdie if'
     return final, extra, 'inside for'

 

 

Running this code in python,

>print list_files2('LLC nakisa', 'prefix.csv' )
>print list_files2('AG company', 'prefix.csv' )


returns

 

('nakisa', 'LLC', 'insdie if')
('AG company', 'nothing', 'inside for')


which is exactly what I need. But when I register this code as a UDF in apache pig for this sample list:

 

 

nakisa company LLC
three Oy
AG Lans
Test OOD

 

pig returns wrong answer on the third line:

 

 

((nakisa company,LLC,insdie if))
((three,Oy,insdie if))
((A G L a n s,,insdie if))
((Test,OOD,insdie if))

 

 

The question is why UDF enters the if loop for the third entry which does not have any match in the prefix.csv file?

Announcements
New solutions