Hi all, I'm putting up a log parser in Pig and I'm trying to use "Pyasn", a Python extension allowing offline querying of an ASN database, to extract Autonomous System Number information from IP addresses
The link to the project is here:
https://pypi.python.org/pypi/pyasn
What happens is that:
1) I successfully installed pyasn (in a previous try via pip-install, currently I have built it manually, but still it doesn't work)
2) I wrote a custom UDF to be later imported in Pig, prior being wrapped inside Jython:
#!/usr/bin/python
import sys
sys.path.append('/usr/lib64/python2.6/site-packages/')
sys.path.append('/usr/lib64/python2.6/site-packages/pyasn-1.6.0b1-py2.6-linux-x86_64.egg/')
sys.path.append('/usr/lib/python2.6/site-packages/')
import pyasn
@outputSchema("asn:chararray")
def asnLookup(ip):
asndb = pyasn.pyasn('asn.dat')
asn = asndb.lookup(ip)
return asn
@outputSchema("asn_prefix:chararray")
def asnGetAsPrefixes(nbr):
asndb = pyasn.pyasn('asn.dat')
asn_prefix = asndb.get_as_prefixes(nbr)
return asn_prefix
3) But when I try to register my UDF, I get the following exception:
grunt> register 'hdfs:///user/xxxxxx/LIB/PYASN/python_pyasn.py' using jython as pythonPyasn;
2017-11-23 17:18:10,468 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2017-11-23 17:18:10,939 [main] INFO org.apache.pig.scripting.jython.JythonScriptEngine - created tmp python.cachedir=/tmp/pig_jython_8271942503558994412
2017-11-23 17:18:12,468 [main] WARN org.apache.pig.scripting.jython.JythonScriptEngine - pig.cmd.args.remainders is empty. This is not expected unless on testing.
2017-11-23 17:18:13,236 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1121: Python Error. Traceback (most recent call last):
File "/tmp/pig6864734086775637011tmp/python_pyasn.py", line 8, in <module>
import pyasn
File "/usr/lib64/python2.6/site-packages/pyasn-1.6.0b1-py2.6-linux-x86_64.egg/pyasn/__init__.py", line 20
SyntaxError: future feature print_function is not defined
4) The puzzling thing is that I'm currently doing the exact same thing with another Python extension for Geo Localization (PyGeoIP) and it works smoothly, the concept is the same, I wrote a UDF and imported it in Pig wrapping it up in Jython and I can call it successfully!
5) If, just to check things are formally OK, I open a PySpark Shell and use the extension, it works without any problems. But I don't want (can't) use Spark in this case, for a number of reasons
Any ideas/insight would be very much appreciated!
Thanks