Reply
Highlighted
Contributor
Posts: 44
Registered: ‎01-05-2016

Pig - custom UDF - Jython + Pyasn

[ Edited ]

Hi all, I'm putting up a log parser in Pig and I'm trying to use "Pyasn", a Python extension allowing offline querying of an ASN database, to extract Autonomous System Number information from IP addresses 

 

The link to the project is here:

 

https://pypi.python.org/pypi/pyasn

 

What happens is that:

 

1) I successfully installed pyasn (in a previous try via pip-install, currently I have built it manually, but still it doesn't work)

 

2) I wrote a custom UDF to be later imported in Pig, prior being wrapped inside Jython:

 

#!/usr/bin/python

import sys
sys.path.append('/usr/lib64/python2.6/site-packages/')
sys.path.append('/usr/lib64/python2.6/site-packages/pyasn-1.6.0b1-py2.6-linux-x86_64.egg/')
sys.path.append('/usr/lib/python2.6/site-packages/')

import pyasn

@outputSchema("asn:chararray")
def asnLookup(ip):
	asndb = pyasn.pyasn('asn.dat')
	asn = asndb.lookup(ip)

	return asn

@outputSchema("asn_prefix:chararray")
def asnGetAsPrefixes(nbr):
	asndb = pyasn.pyasn('asn.dat')
	asn_prefix = asndb.get_as_prefixes(nbr)

	return asn_prefix

 

3) But when I try to register my UDF, I get the following exception:

 

grunt> register 'hdfs:///user/xxxxxx/LIB/PYASN/python_pyasn.py' using jython as pythonPyasn;
2017-11-23 17:18:10,468 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2017-11-23 17:18:10,939 [main] INFO  org.apache.pig.scripting.jython.JythonScriptEngine - created tmp python.cachedir=/tmp/pig_jython_8271942503558994412
2017-11-23 17:18:12,468 [main] WARN  org.apache.pig.scripting.jython.JythonScriptEngine - pig.cmd.args.remainders is empty. This is not expected unless on testing.
2017-11-23 17:18:13,236 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1121: Python Error. Traceback (most recent call last):
  File "/tmp/pig6864734086775637011tmp/python_pyasn.py", line 8, in <module>
    import pyasn
  File "/usr/lib64/python2.6/site-packages/pyasn-1.6.0b1-py2.6-linux-x86_64.egg/pyasn/__init__.py", line 20
SyntaxError: future feature print_function is not defined

4) The puzzling thing is that I'm currently doing the exact same thing with another Python extension for Geo Localization (PyGeoIP) and it works smoothly, the concept is the same, I wrote a UDF and imported it in Pig wrapping it up in Jython and I can call it successfully!

 

5) If, just to check things are formally OK, I open a PySpark Shell and use the extension, it works without any problems. But I don't want (can't) use Spark in this case, for a number of reasons

 

Any ideas/insight would be very much appreciated!

 

Thanks

 

 

Announcements