Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Nifi Execute Script - Python script with external dependencies

Highlighted

Nifi Execute Script - Python script with external dependencies

New Contributor

I am trying to use Nifi for some simple ETL, and I'd like to employ the ExecuteScript processor with Python as one step in my dataflow. Currently, I have two python files (enrichment.py and utils.py) in the same directory, and the utils.py must be able to import the requests and redis libraries. Both of these files have been run successfully locally (not in an instance of Nifi).

My system is running on three nodes, all three nodes have Python 3.6.8 installed, and I've also used pip3 to install both the requests and redis libraries to their default location (/usr/local/lib/python3.6/site-packages/requests|redis). 

After doing some reading, I found that the "Module Directory" field in the ExecuteScript processor seems to be where I need to focus my attention. Currently, I have this field set to "/nifi-data/nifi_scripts/enrichments,/usr/local/lib/python3.6/site-packages/requests,/usr/local/lib/python3.6/site-packages/redis" (please note that enrichment.py imports utils.py, which in turn imports requests and redis, and both of these live in the "/nifi-data/nifi_scripts/enrichments" directory). 

Finally, in the Script File field, I have "/nifi-data/nifi_scripts/enrichments/enrichment.py", which is in a shared data store that all nodes have access to.

 

I consistently get the error 

Failed to process session due to javax.script.ScriptException: ImportError: No module named requests in <script> at line number 1: org.apache.nifi.processor.exception.ProcessException: javax.script.ScriptException: ImportError: No module named requests in <script> at line number 1

which would imply that at least the utils.py file is being imported, as it is the only file that imports requests. It's worth noting that in both the requests and redis libraries, there is no file explicitly named "requests.py" or "redis.py" respectively. 

Looking in the actual log files for the primary node doesn't yield any more useful information.

Like I said, I've done some research as to the proper method to tackle this problem and I believe I'm doing everything correctly. Is there something else that could be causing this issue, or is there a step in the process that I've missed? 

Don't have an account?
Coming from Hortonworks? Activate your account here