Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

cannot use numpy or scipy in python in nifi executescript

avatar
Expert Contributor

I used the python in executescript.

I installed numpy/ scipy, but always got the error: 'ImportError: No module named type_check'

I run the same code in jupyter successfully.

Any suggestion? Thanks in advance.

1 ACCEPTED SOLUTION

avatar
Master Guru

Although the script engine reports its name as "python", it is actually Jython, which can only use pure Python modules, not native modules like numpy/scipy. If this is needed, consider ExecuteProcess or (if you have incoming flow files) ExecuteStreamCommand which can execute the command-line python.

View solution in original post

8 REPLIES 8

avatar
Master Guru

Although the script engine reports its name as "python", it is actually Jython, which can only use pure Python modules, not native modules like numpy/scipy. If this is needed, consider ExecuteProcess or (if you have incoming flow files) ExecuteStreamCommand which can execute the command-line python.

avatar
New Contributor

So what about importing modules like paramiko, pysftp?

avatar
Master Guru

Paramiko uses Crypto which is a native module, so this is not pure Python either and cannot be used in ExecuteScript. ExecuteProcess or ExecuteStreamCommand should work though.

avatar
Expert Contributor
Thanks for your explain.


	I used ExecuteScript as below:


from scipy.stats import f_oneway
import json
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback




class PyStreamCallback(StreamCallback):
  def __init__(self):
        pass
  def process(self, inputStream, outputStream):
    jsonData = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
   
    data = json.loads(jsonData)


    values = [i['fltValue'] for i in data["data"]]
    firsts = [i['first'] for i in data["data"]]
    seconds = [i['second'] for i in data["data"]]


    f,p = f_oneway(values,firsts,seconds)
    data["f"] = f
    data["p"] = p
              
    outputStream.write(bytearray(json.dumps(newObj, indent=4).encode('utf-8'))) 


flowFile = session.get()
if (flowFile != None):
  flowFile = session.write(flowFile,PyStreamCallback())
  session.transfer(flowFile, REL_SUCCESS)


how to use these 2 processor instead of?

avatar
Master Guru

seems nltk has native stuff too.

That should be explicit

avatar
Contributor

Hi Boyer, did you end up getting this to work? Did you use ExecuteStreamCommand? If so, would you mind sharing the configuration of your processor (as in how is the Python script called), and lastly, how do you ingest the FlowFile in your python script?

Any advice is much appreciated.

avatar
Master Guru

You have to make sure you install the libraries on the server for the correct python version.

I always run with executeprocess or executestreamcommand and wrap my python in a shell

avatar
Explorer

hi Timothy,

can you explain this a bit?

is it possible to execute a script.py with #!/usr/bin/python inside and then run it as a bash with NiFi?

Or how ist this possible to wrap python in a shell? would help me a lot to start my python scripts with sklearn / numpy / pandas inside NiFi, then grab the exports.csv or exports.json and go ahead with my NiFi worlkflow for FlowFiles.

cheers