Created on 09-11-2017 09:17 PM
Apache NiFi allows us to rapidly create and operate very flexible and powerful dataflows. There are times, however, when the full flexibility and power of NiFi may not be required for the task at hand. For these times, MiNiFi may be a good fit. In particular, MiNiFI - C++ is worth considering when resources such as memory and compute power are constrained to such an extent that it is not feasible to run a full Java virtual machine.
We are going to demonstrate how to deploy a MiNiFi - C++ dataflow to a cloud compute node that has only 64 megabytes of RAM. These types of nodes may be useful as a cost-savings measure, because cloud compute services typically charge based on resource usage.
We'll start by cloning the latest nifi-minifi-cpp src:
$ git clone https://github.com/apache/nifi-minifi-cpp.git $ cd nifi-minifi-cpp/
Since this demo relies on a few commits which are not yet merged into master, we'll cherry pick the commits:
$ git remote add achristianson https://github.com/achristianson/nifi-minifi-cpp.git $ git fetch --all $ git cherry-pick cb9bdf 6800ae0
Next, we'll create a python virtual environment and add some helpful MiNiFi modules to the PYTHONPATH which will help us create our dataflow:
$ virtualenv ./env $ . ./env/bin/activate $ pip install --upgrade pip $ pip install --upgrade pyyaml docker $ export PYTHONPATH="$( pwd )"/docker/test/integration
Next, we'll start python:
$ python
Next, we'll create a dataflow:
>>> from minifi import * >>> f = flow_yaml(ListenHTTP(8080) >> LogAttribute() >> PutFile('/tmp')) >>> print(f) Connections: - destination id: 65472f6f-d87e-43c7-aec2-208046c028bc name: c42fa886-b7e0-48ac-843d-6a9eeb66eb56 source id: ad2f2e7d-dde9-4496-bc63-0464e4f52a01 source relationship name: success - destination id: 15a29d90-7012-44c2-b677-b787166c7426 name: f2f7463e-ed4c-4e8a-8752-01740c60775f source id: 65472f6f-d87e-43c7-aec2-208046c028bc source relationship name: success Controller Services: [] Flow Controller: name: MiNiFi Flow Processors: - Properties: Listening Port: 8080 auto-terminated relationships list: [] class: org.apache.nifi.processors.standard.ListenHTTP id: ad2f2e7d-dde9-4496-bc63-0464e4f52a01 name: ad2f2e7d-dde9-4496-bc63-0464e4f52a01 penalization period: 30 sec run duration nanos: 0 scheduling period: 1 sec scheduling strategy: EVENT_DRIVEN yield period: 1 sec - Properties: {} auto-terminated relationships list: [] class: org.apache.nifi.processors.standard.LogAttribute id: 65472f6f-d87e-43c7-aec2-208046c028bc name: 65472f6f-d87e-43c7-aec2-208046c028bc penalization period: 30 sec run duration nanos: 0 scheduling period: 1 sec scheduling strategy: EVENT_DRIVEN yield period: 1 sec - Properties: Output Directory: /tmp auto-terminated relationships list: - success - failure class: org.apache.nifi.processors.standard.PutFile id: 15a29d90-7012-44c2-b677-b787166c7426 name: 15a29d90-7012-44c2-b677-b787166c7426 penalization period: 30 sec run duration nanos: 0 scheduling period: 1 sec scheduling strategy: EVENT_DRIVEN yield period: 1 sec Remote Processing Groups: []
This flow looks good, so we'll save it to config.yml and exit python:
>>> with open('conf/config.yml', 'w') as cf: ... cf.write(f) ... >>>
Next, we'll build the docker image:
$ cd docker $ ./DockerBuild.sh 1000 1000 0.3.0 minificppsource ..
Now we're ready to deploy the image. For this demo, we'll deploy to hyper.sh and assume that hyper has already been configured. The container size will be s1, which is an instance with only 64MB of ram. We'll also allocate and attach a floating IP (FIP):
$ hyper load -l apacheminificpp:0.3.0 $ hyper run --size s1 -d --name minifi -p 8080:8080 apacheminificpp:0.3.0 $ hyper fip allocate 1 199.245.60.9 $ hyper fip attach 199.245.60.9 minifi
Now our MiNiFi - C++ container is running and has an IP attached to it. Let's generate and send some data to the new instance:
$ dd if=/dev/urandom of=./testdat bs=1M count=1 $ sha256sum ./testdat da388a0bd1f69aa94674a20dd285df1b2e553b8cd9425e33498398d25846d692 ./testdat $ curl -vvv -X POST --data-binary @./testdat http://199.245.60.9:8080/contentListener * About to connect() to 199.245.60.9 port 8080 (#0) * Trying 199.245.60.9... * Connected to 199.245.60.9 (199.245.60.9) port 8080 (#0) > POST /contentListener HTTP/1.1 > User-Agent: curl/7.29.0 > Host: 199.245.60.9:8080 > Accept: */* > Content-Length: 1048576 > Content-Type: application/x-www-form-urlencoded > Expect: 100-continue > < HTTP/1.1 100 Continue < HTTP/1.1 200 OK < Content-Type: text/html < Content-Length: 0 < * Connection #0 to host 199.245.60.9 left intact
The instance successfully received the test data. For good measure, let's verify the data stored on the instance has the same sha256 sum as the local data:
$ hyper exec minifi ls /tmp 150516327995820887 $ hyper exec minifi sha256sum /tmp/1505163279958208874 da388a0bd1f69aa94674a20dd285df1b2e553b8cd9425e33498398d25846d692 /tmp/1505163279958208874
The sha256 sum matches, so our MiNiFi - C++ instance has successfully received and stored the generated test data. We can now clean up all the resources if desired. Although the overall process to deploy a full-blown Apache NiFi instance would be similar, it would be impossible to use an s1 (64MB) instance. We can therefore significantly save on compute service expenses by deploying MiNiFi - C++ when flows are simple enough that they fit within the limited feature scope of MiNiFi.