Community Articles

Find and share helpful community-sourced technical articles.
Labels (3)
avatar

Apache NiFi allows us to rapidly create and operate very flexible and powerful dataflows. There are times, however, when the full flexibility and power of NiFi may not be required for the task at hand. For these times, MiNiFi may be a good fit. In particular, MiNiFI - C++ is worth considering when resources such as memory and compute power are constrained to such an extent that it is not feasible to run a full Java virtual machine.

We are going to demonstrate how to deploy a MiNiFi - C++ dataflow to a cloud compute node that has only 64 megabytes of RAM. These types of nodes may be useful as a cost-savings measure, because cloud compute services typically charge based on resource usage.

We'll start by cloning the latest nifi-minifi-cpp src:

$ git clone https://github.com/apache/nifi-minifi-cpp.git
$ cd nifi-minifi-cpp/

Since this demo relies on a few commits which are not yet merged into master, we'll cherry pick the commits:

$ git remote add achristianson https://github.com/achristianson/nifi-minifi-cpp.git
$ git fetch --all
$ git cherry-pick cb9bdf 6800ae0

Next, we'll create a python virtual environment and add some helpful MiNiFi modules to the PYTHONPATH which will help us create our dataflow:

$ virtualenv ./env
$ . ./env/bin/activate
$ pip install --upgrade pip
$ pip install --upgrade pyyaml docker
$ export PYTHONPATH="$( pwd )"/docker/test/integration

Next, we'll start python:

$ python

Next, we'll create a dataflow:

>>> from minifi import *
>>> f = flow_yaml(ListenHTTP(8080) >> LogAttribute() >> PutFile('/tmp'))
>>> print(f)
Connections:
- destination id: 65472f6f-d87e-43c7-aec2-208046c028bc
  name: c42fa886-b7e0-48ac-843d-6a9eeb66eb56
  source id: ad2f2e7d-dde9-4496-bc63-0464e4f52a01
  source relationship name: success
- destination id: 15a29d90-7012-44c2-b677-b787166c7426
  name: f2f7463e-ed4c-4e8a-8752-01740c60775f
  source id: 65472f6f-d87e-43c7-aec2-208046c028bc
  source relationship name: success
Controller Services: []
Flow Controller:
  name: MiNiFi Flow
Processors:
- Properties:
    Listening Port: 8080
  auto-terminated relationships list: []
  class: org.apache.nifi.processors.standard.ListenHTTP
  id: ad2f2e7d-dde9-4496-bc63-0464e4f52a01
  name: ad2f2e7d-dde9-4496-bc63-0464e4f52a01
  penalization period: 30 sec
  run duration nanos: 0
  scheduling period: 1 sec
  scheduling strategy: EVENT_DRIVEN
  yield period: 1 sec
- Properties: {}
  auto-terminated relationships list: []
  class: org.apache.nifi.processors.standard.LogAttribute
  id: 65472f6f-d87e-43c7-aec2-208046c028bc
  name: 65472f6f-d87e-43c7-aec2-208046c028bc
  penalization period: 30 sec
  run duration nanos: 0
  scheduling period: 1 sec
  scheduling strategy: EVENT_DRIVEN
  yield period: 1 sec
- Properties:
    Output Directory: /tmp
  auto-terminated relationships list:
  - success
  - failure
  class: org.apache.nifi.processors.standard.PutFile
  id: 15a29d90-7012-44c2-b677-b787166c7426
  name: 15a29d90-7012-44c2-b677-b787166c7426
  penalization period: 30 sec
  run duration nanos: 0
  scheduling period: 1 sec
  scheduling strategy: EVENT_DRIVEN
  yield period: 1 sec
Remote Processing Groups: []

This flow looks good, so we'll save it to config.yml and exit python:

>>> with open('conf/config.yml', 'w') as cf:
...   cf.write(f)
... 
>>>

Next, we'll build the docker image:

$ cd docker
$ ./DockerBuild.sh 1000 1000 0.3.0 minificppsource ..

Now we're ready to deploy the image. For this demo, we'll deploy to hyper.sh and assume that hyper has already been configured. The container size will be s1, which is an instance with only 64MB of ram. We'll also allocate and attach a floating IP (FIP):

$ hyper load -l apacheminificpp:0.3.0
$ hyper run --size s1 -d --name minifi -p 8080:8080 apacheminificpp:0.3.0
$ hyper fip allocate 1
199.245.60.9
$ hyper fip attach 199.245.60.9 minifi

Now our MiNiFi - C++ container is running and has an IP attached to it. Let's generate and send some data to the new instance:

$ dd if=/dev/urandom of=./testdat bs=1M count=1
$ sha256sum ./testdat 
da388a0bd1f69aa94674a20dd285df1b2e553b8cd9425e33498398d25846d692  ./testdat
$ curl -vvv -X POST --data-binary @./testdat http://199.245.60.9:8080/contentListener
* About to connect() to 199.245.60.9 port 8080 (#0)
*   Trying 199.245.60.9...
* Connected to 199.245.60.9 (199.245.60.9) port 8080 (#0)
> POST /contentListener HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 199.245.60.9:8080
> Accept: */*
> Content-Length: 1048576
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
> 
< HTTP/1.1 100 Continue
< HTTP/1.1 200 OK
< Content-Type: text/html
< Content-Length: 0
< 
* Connection #0 to host 199.245.60.9 left intact

The instance successfully received the test data. For good measure, let's verify the data stored on the instance has the same sha256 sum as the local data:

$ hyper exec minifi ls /tmp
150516327995820887
$ hyper exec minifi sha256sum /tmp/1505163279958208874
da388a0bd1f69aa94674a20dd285df1b2e553b8cd9425e33498398d25846d692  /tmp/1505163279958208874

The sha256 sum matches, so our MiNiFi - C++ instance has successfully received and stored the generated test data. We can now clean up all the resources if desired. Although the overall process to deploy a full-blown Apache NiFi instance would be similar, it would be impossible to use an s1 (64MB) instance. We can therefore significantly save on compute service expenses by deploying MiNiFi - C++ when flows are simple enough that they fit within the limited feature scope of MiNiFi.

1,625 Views