Hi all, I'm putting up a log parser in Pig and I'm trying to use "Pyasn", a Python extension allowing offline querying of an ASN database, to extract Autonomous System Number information from IP addresses
The link to the project is here:
What happens is that:
1) I successfully installed pyasn (in a previous try via pip-install, currently I have built it manually, but still it doesn't work)
2) I wrote a custom UDF to be later imported in Pig, prior being wrapped inside Jython:
#!/usr/bin/python import sys sys.path.append('/usr/lib64/python2.6/site-packages/') sys.path.append('/usr/lib64/python2.6/site-packages/pyasn-1.6.0b1-py2.6-linux-x86_64.egg/') sys.path.append('/usr/lib/python2.6/site-packages/') import pyasn @outputSchema("asn:chararray") def asnLookup(ip): asndb = pyasn.pyasn('asn.dat') asn = asndb.lookup(ip) return asn @outputSchema("asn_prefix:chararray") def asnGetAsPrefixes(nbr): asndb = pyasn.pyasn('asn.dat') asn_prefix = asndb.get_as_prefixes(nbr) return asn_prefix
3) But when I try to register my UDF, I get the following exception:
grunt> register 'hdfs:///user/xxxxxx/LIB/PYASN/python_pyasn.py' using jython as pythonPyasn; 2017-11-23 17:18:10,468 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2017-11-23 17:18:10,939 [main] INFO org.apache.pig.scripting.jython.JythonScriptEngine - created tmp python.cachedir=/tmp/pig_jython_8271942503558994412 2017-11-23 17:18:12,468 [main] WARN org.apache.pig.scripting.jython.JythonScriptEngine - pig.cmd.args.remainders is empty. This is not expected unless on testing. 2017-11-23 17:18:13,236 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1121: Python Error. Traceback (most recent call last): File "/tmp/pig6864734086775637011tmp/python_pyasn.py", line 8, in <module> import pyasn File "/usr/lib64/python2.6/site-packages/pyasn-1.6.0b1-py2.6-linux-x86_64.egg/pyasn/__init__.py", line 20 SyntaxError: future feature print_function is not defined
4) The puzzling thing is that I'm currently doing the exact same thing with another Python extension for Geo Localization (PyGeoIP) and it works smoothly, the concept is the same, I wrote a UDF and imported it in Pig wrapping it up in Jython and I can call it successfully!
5) If, just to check things are formally OK, I open a PySpark Shell and use the extension, it works without any problems. But I don't want (can't) use Spark in this case, for a number of reasons
Any ideas/insight would be very much appreciated!