Created on 02-06-2019 05:49 PM - edited 08-17-2019 04:49 AM
Using Deployed Models as a Function as a Service
Using Cloudera Data Science Workbench with Apache NiFi we can easily call functions within our deployed models from Apache NiFi as part of flows. I am working against CDSW on HDP (https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_hdp.html), but it will work for all CDSW regardless of install type.
In my simple example, I built a Python model that uses TextBlob to run sentiment against a passed in sentence. It returns Sentiment Polarity and Subjectivity which we can immediately act upon in our flow.
CDSW is extremely easy to work with and I was up and running in a few minutes. For my model, I created a python 3 script and a shell script for install details. Both of these artifacts are available here: https://github.com/tspannhw/nifi-cdsw
My Apache NiFi 1.8 flow is here (I use no custom processors): cdsw-twitter-sentiment.xml
Deploying a Machine Learning Model as a REST Service
Once you login to CDSW and create a project or choose an existing one (https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_projects.html). From your project, open workbench and you can install some libraries and test some Python. I am using a Python 3 session to download the TextBlob/NLTK Corpora for NLP.
Let's Pip Install some libraries for testing
Let's Create a new Model
You choose your file (mine is sentiment.py see github). The function name is actually sentiment. Notice a typo I had to rebuild this and deploy. You setup an example input (sentence is the input parameter name) and an example output. Input and output will be JSON since this is a REST API.
Let's Deploy It (Python 3)
The deploy will build it for deployment.
We can see standard output, standard error, status, # of REST calls received and success.
Once a Model is Deployed We Can Control It
We can stop it, rebuild it or replace the files if need be. I had to update things a few times. The amount of resources used for the model rest hosting if your choice from a drop down. Since I am doing something small I picked the smallest model with only 1 virtual CPU and 2 GB of RAM. All of this is running in Docker on Kubernetes!
Once Deployed, It's Ready To Test and Use From Apache NiFi
Just click test. See the JSON results and we can now call it from an Apache NiFi flow.
Once Deployed We Can Monitor The Model
Let's Run the Test
See the status and response!
Apache NiFi Example Flow
Step 1: Call Twitter
Step 2: Extract Social Attributes of Interest
Step 3: Build our web call with our access key and function parameter
Step 4: Extract our string as a flow file to send to the HTTP Post
Step 5: Call Our Cloudera Data Science Workbench REST API (see tester).
Step 6: Extract the two result values.
Step 7: Let's route on the sentiment
We can have negative (<0), neutral (0), positive (>0) and very positive (1) polarity of the sentiment. See TextBlob for more information on how this works.
Step 8: Send bad sentiment to a slack channel for human analysis.
We send all the related information to a slack channel including the message.
Example Message Sent to Slack
Step 9: Store all the results (or some) in either Phoenix/HBase, Hive LLAP, Impala, Kudu or HDFS.
Results as Attributes
Slack Message Call
${msg:append(" User:"):append(${user_name}):append(${handle}):append(" Geo:"):append(${coordinates}):append(${geo}):append(${location}):append(${place}):append(" Hashtags:"):append(${hashtags}):append(" Polarity:"):append(${polarity}):append(" Subjectivity:"):append(${subjectivity}):append(" Friends Count:"):append(${friends_count}):append(" Followers Count:"):append(${followers_count}):append(" Retweet Count:"):append(${retweet_count}):append(" Source:"):append(${source}):append(" Time:"):append(${time}):append(" Tweet ID:"):append(${tweet_id})}
REST CALL to Model
{"accessKey":"from your workbench","request":{"sentence":"${msg:replaceAll('\"', ''):replaceAll('\n','')}"}}
Resources