Member since
09-11-2017
4
Posts
7
Kudos Received
0
Solutions
09-11-2017
09:17 PM
2 Kudos
Apache NiFi allows us to rapidly create and operate very flexible and powerful
dataflows. There are times, however, when the full flexibility and power of NiFi
may not be required for the task at hand. For these times, MiNiFi may be a
good fit. In particular, MiNiFI - C++ is worth considering when resources such
as memory and compute power are constrained to such an extent that it is not
feasible to run a full Java virtual machine. We are going to demonstrate how to deploy a MiNiFi - C++ dataflow to a
cloud compute node that has only 64 megabytes of RAM. These types of nodes may
be useful as a cost-savings measure, because cloud compute services typically
charge based on resource usage. We'll start by cloning the latest nifi-minifi-cpp src: $ git clone https://github.com/apache/nifi-minifi-cpp.git
$ cd nifi-minifi-cpp/
Since this demo relies on a few commits which are not yet merged into master,
we'll cherry pick the commits: $ git remote add achristianson https://github.com/achristianson/nifi-minifi-cpp.git
$ git fetch --all
$ git cherry-pick cb9bdf 6800ae0
Next, we'll create a python virtual environment and add some helpful MiNiFi
modules to the PYTHONPATH which will help us create our dataflow: $ virtualenv ./env
$ . ./env/bin/activate
$ pip install --upgrade pip
$ pip install --upgrade pyyaml docker
$ export PYTHONPATH="$( pwd )"/docker/test/integration
Next, we'll start python: $ python Next, we'll create a dataflow: >>> from minifi import *
>>> f = flow_yaml(ListenHTTP(8080) >> LogAttribute() >> PutFile('/tmp'))
>>> print(f)
Connections:
- destination id: 65472f6f-d87e-43c7-aec2-208046c028bc
name: c42fa886-b7e0-48ac-843d-6a9eeb66eb56
source id: ad2f2e7d-dde9-4496-bc63-0464e4f52a01
source relationship name: success
- destination id: 15a29d90-7012-44c2-b677-b787166c7426
name: f2f7463e-ed4c-4e8a-8752-01740c60775f
source id: 65472f6f-d87e-43c7-aec2-208046c028bc
source relationship name: success
Controller Services: []
Flow Controller:
name: MiNiFi Flow
Processors:
- Properties:
Listening Port: 8080
auto-terminated relationships list: []
class: org.apache.nifi.processors.standard.ListenHTTP
id: ad2f2e7d-dde9-4496-bc63-0464e4f52a01
name: ad2f2e7d-dde9-4496-bc63-0464e4f52a01
penalization period: 30 sec
run duration nanos: 0
scheduling period: 1 sec
scheduling strategy: EVENT_DRIVEN
yield period: 1 sec
- Properties: {}
auto-terminated relationships list: []
class: org.apache.nifi.processors.standard.LogAttribute
id: 65472f6f-d87e-43c7-aec2-208046c028bc
name: 65472f6f-d87e-43c7-aec2-208046c028bc
penalization period: 30 sec
run duration nanos: 0
scheduling period: 1 sec
scheduling strategy: EVENT_DRIVEN
yield period: 1 sec
- Properties:
Output Directory: /tmp
auto-terminated relationships list:
- success
- failure
class: org.apache.nifi.processors.standard.PutFile
id: 15a29d90-7012-44c2-b677-b787166c7426
name: 15a29d90-7012-44c2-b677-b787166c7426
penalization period: 30 sec
run duration nanos: 0
scheduling period: 1 sec
scheduling strategy: EVENT_DRIVEN
yield period: 1 sec
Remote Processing Groups: [] This flow looks good, so we'll save it to config.yml and exit python: >>> with open('conf/config.yml', 'w') as cf:
... cf.write(f)
...
>>>
Next, we'll build the docker image: $ cd docker
$ ./DockerBuild.sh 1000 1000 0.3.0 minificppsource ..
Now we're ready to deploy the image. For this demo, we'll deploy to hyper.sh and
assume that hyper has already been configured. The container size will be s1,
which is an instance with only 64MB of ram. We'll also allocate and attach a
floating IP (FIP): $ hyper load -l apacheminificpp:0.3.0
$ hyper run --size s1 -d --name minifi -p 8080:8080 apacheminificpp:0.3.0
$ hyper fip allocate 1
199.245.60.9
$ hyper fip attach 199.245.60.9 minifi
Now our MiNiFi - C++ container is running and has an IP attached to it. Let's
generate and send some data to the new instance: $ dd if=/dev/urandom of=./testdat bs=1M count=1
$ sha256sum ./testdat
da388a0bd1f69aa94674a20dd285df1b2e553b8cd9425e33498398d25846d692 ./testdat
$ curl -vvv -X POST --data-binary @./testdat http://199.245.60.9:8080/contentListener
* About to connect() to 199.245.60.9 port 8080 (#0)
* Trying 199.245.60.9...
* Connected to 199.245.60.9 (199.245.60.9) port 8080 (#0)
> POST /contentListener HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 199.245.60.9:8080
> Accept: */*
> Content-Length: 1048576
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
< HTTP/1.1 200 OK
< Content-Type: text/html
< Content-Length: 0
<
* Connection #0 to host 199.245.60.9 left intact
The instance successfully received the test data. For good measure, let's verify
the data stored on the instance has the same sha256 sum as the local data: $ hyper exec minifi ls /tmp
150516327995820887
$ hyper exec minifi sha256sum /tmp/1505163279958208874
da388a0bd1f69aa94674a20dd285df1b2e553b8cd9425e33498398d25846d692 /tmp/1505163279958208874
The sha256 sum matches, so our MiNiFi - C++ instance has successfully received
and stored the generated test data. We can now clean up all the resources if
desired. Although the overall process to deploy a full-blown Apache NiFi
instance would be similar, it would be impossible to use an s1 (64MB) instance. We can
therefore significantly save on compute service expenses by deploying MiNiFi -
C++ when flows are simple enough that they fit within the limited feature scope
of MiNiFi.
... View more
Labels: