Community Articles

Find and share helpful community-sourced technical articles.
Labels (2)
avatar
Contributor

Getting started with MiNiFi

In this tutorial, we will learn how configure MiNiFi to send data to NiFi:

  1. Installing HDF
  2. Installing MiNiFi
  3. Setting up the Flow for NiFi
  4. Setting up the Flow for MiNiFi
  5. Preparing the flow for MiNiFi
  6. Configuring and starting MiNiFi
  7. Enjoying the data flow!

References

For a primer on HDF, you can refer to the tutorials here Tutorials and User Guides

Installing HDF

  1. If you do not have NiFi installed, please follow the instructions found here

NOTE: The above installation guide is for HDF 1.2.0.1, this is the version that matches Apache MiNiFi 0.0.1. Although HDF 2.0 may work, for this exercise -- it is not recommended at this time.

Installing MiNiFi

Now that you have NiFi up and running it is time to download and install MiNiFi.

  1. Open a browser. Download the MiNiFi Binaries from Apache MiNiFi Downloads Page. There are two options: tar.gz a format tailored to Linux and a zip file more compatible with Windows. If you are using a Mac either option is just fine.

    <Display Name>

    Figure 1. MiNiFi download page

    For this tutorial I have downloaded the tar.gz on a Mac as shown above in

  2. To install MiNiFi, extract the files from the compressed file to a location in which you want to run the application. I have chosen to install it to /Users/apsaltis/minifi-to-nifi

    The image below show MiNiFi downloaded and installed in this directory:

    <Display Name>

Setting up the Flow for NiFi

NOTE: Before starting NiFi we need to enable Site-to-Site communication. To do that do the following:

  • Open <$NIFI_INSTALL_DIR>/conf/nifi.properties in your favorite editor
  • Change:
    nifi.remote.input.socket.host=
    nifi.remote.input.socket.port=
    nifi.remote.input.secure=true

    To

nifi.remote.input.socket.host=localhost <-- This is only being done for this exercise as MiNiFi and NiFi are running on the same host. This is not a recommended way of deploying the two products.
nifi.remote.input.socket.port=10000
nifi.remote.input.secure=false <-- This implies we are only using HTTP and are not securing the communication between MiNiFi and NiFi. For this exercise that is OK, however, it is important to consider your security needs when deploying these technologies.
  • Restart NiFi if it was running

Now that we have NiFi up and running and MiNiFi installed and ready to go, the next thing to do is to create our data flow. To do that we are going to first start with creating the flow in NiFi. Remember if you do not have NiFi running execute the following command:

<$NIFI_INSTALL_DIR>/bin/nifi.sh start

Now we should be ready to create our flow. To do this do the following:

  1. Open a browser and go to: http://\:\/nifi on my machine that url looks is http://127.0.0.1:8080/nifi and going to it in the browser looks like this:

    <Display Name> Figure 2. Empty NiFi Canvas

  2. The first thing we are going to do is setup an Input Port. This is the port that MiNiFi will be sending data to. To do this drag the Input Port icon to the canvas and call it "From MiNiFi" as show below in figure 3.

    <Display Name> Figure 3. Adding the Input Port

  3. Now that the Input Port is configured we need to have somewhere for the data to go once we receive it. In this case we will keep it very simple and just log the attributes. To do this drag the Processor icon to the canvas and choose the LogAttribute processor as shown below in figure 4.

    <Display Name> Figure 4. Adding the LogAttribute processor

  4. Now that we have the input port and the processor to handle our data, we need to connect them. After creating the connection your data flow should look like figure 5 below.

    <Display Name> Figure 5. NiFi Flow

  5. We are now ready to build the MiNiFi side of the flow. To do this do the following:
    • Add a GenerateFlowFile processor to the canvas (don't forget to configure the properties on it)
    • Add a Remote Processor Group to the canvas as shown below in Figure 6

    <Display Name>

    Figure 6. Adding the Remote Processor Group

    • For the URL copy and paste the URL for the NiFi UI from your browser
    • Connect the GenerateFlowFile to the Remote Process Group as shown below in figure 7. (You may have to refresh the Remote Processor Group, before the input port will be available)

    <Display Name>

    Figure 7. Adding GenerateFlowFile Connection to Remote Processor Group

  6. Your canvas should now look similar to what is shown below in figure 8.

    <Display Name>

    Figure 8. Adding GenerateFlowFile Connection to Remote Processor Group

  7. There is one last step we need to take before we can export the template. We need to make sure that we set the back pressure between the GenerateFlowFile processor and the Remote Process Group (RPG). That way if you stop NiFi and not MiNiFi you will not fill-up the hard drive where MiNiFi is running. To set the back pressure do the following:
    1. Right-click on the "From MiNiFi" connection and choose "Configure"
    2. Choose the "Settings" tab
    3. Set the "Back pressure object threshold" and "Back pressure data size threshold" to 10000 and 1 GB respectively.
  8. The next step is to generate the flow we need for MiNiFi. To do this do the following steps:
    • Create a template for MiNiFi illustrated below in figure 9. <Display Name> Figure 9. Creating a template
    • Select the GenerateFlowFile and the NiFi Flow Remote Processor Group (these are the only things needed for MiMiFi)
    • Select the "Create Template" button from the toolbar
    • Choose a name for your template
  9. We now need to save our template, as illustrated below in figure 10. <Display Name>

    Figure 10. Template button

  10. Now we need to download the template as shown below in figure 11 <Display Name>

    Figure 11. Saving a template

  11. We are now ready to setup MiNiFi. However before doing that we need to convert the template to YAML format which MiNiFi uses. To do this we need to do the following:
    • Navigate to the minifi-toolkit directory (minifi-toolkit-0.0.1)
    • Transform the template that we downloaded using the following command:

      bin/config.sh transform <INPUT_TEMPLATE> <OUTPUT_FILE>

    For example:

    bin/config.sh transform MiNiFi_Flow.xml config.yml

  12. Next copy the config.yml to the minifi-0.0.1/conf directory. That is the file that MiNiFi uses to generate the nifi.properties file and the flow.xml.gz for MiNiFi.
  13. That is it, we are now ready to start MiNiFi. To start MiNiFi from a command prompt execute the following:
    cd <MINIFI_INSTALL_DIR>
    bin/minifi.sh start

You should be able to now go to your NiFi flow and see data coming in from MiNiFi.

27,660 Views
Comments
avatar
Master Mentor

@apsaltis

I might suggest we make a few changes to this article:

1. The link you have for installing HDF talks about installing HDF 2.0. HDF 2.0 is based off Apache NiFi 1.0. Since MiNiFi is built from Apache NiFi 0.6.1, the dataflows built and templated for conversion into MiNiFi YAML files must also be built using an Apache 0.6 based NiFi install. (I see in your example above you did just that but this needs to be made clear)

2. I would never recommend setting nifi.remote.input.socket.host= to "localhost". When a NiFi or MiNiFi connects to another NiFi via S2S, the destination NiFi will return the value set for this property along with the value set for nifi.remote.input.socket.port=. In your example that means the source MiNiFi would then try to send FlowFiles to localhost:10000. This is ONLY going to work if the destination NIFi is located on the same server as MiNiFi.

3. You should also explain why you are changing nifi.remote.input.secure= from true to false. Changing this is not a requirement of MiNiFi, it is simply a matter of preference (If set to true, both MiNiFi (source) and NiFi (destination) must be setup to run securely over https). In your example you are working with http only.

4. While doable, one should never route the "success" relationship from any processor back on to itself. If you have reached the end of your dataflow, you should auto-terminate the "success" relationship.

5. I am not clear what you are telling me to do based on this line under step 5:

  • Start the From MiNiFi Input Port

6. When using the GenerateFlowFile processor in an example flow it is important to recommend that user set a run schedule other then "0 sec". Since MiNiFi is Apache 0.6.1 based there is no default backpressure on connections and with a run schedule of "0 sec" it is very likely this processor will produce FlowFiles much faster then they can be sent across S2S. This will eventual fill the hard drive of the system running MiNiFi. An even better recommendation would be to make sure they set back pressure between the GenerateFlowFile processor and the Remote Process Group (RPG). That way even if someone stops the NiFi and not the MiNiFi they don't fill their MiNiFI hard drive.

Thanks,

Matt

avatar
Contributor

Thanks for the feedback @mclark all of your suggestions should now be apparent in the content. Thanks again for the input.

avatar
Contributor

@Roger Young -- You are correct once the template is downloaded you should be able to delete it and the related part of the flow. I am not sure off-hand why you are not seeing data flow form MiNiFi. I am actually using this exercise in a class I am teaching today and will be sure to test this and see what happens.

What version of MiNiFi and NiFi are you using?

avatar
Expert Contributor

hi, i am using nifi-1.2.0 and minifi-0.2.0.

avatar
Expert Contributor

Below is the config.yml file in minifi. There is no mention of processors there, maybe ive done something wrong

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the \"License\"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an \"AS IS\" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

MiNiFi Config Version: 3
Flow Controller:
  name: MiNiFi Flow
  comment: ''
Core Properties:
  flow controller graceful shutdown period: 10 sec
  flow service write delay interval: 500 ms
  administrative yield duration: 30 sec
  bored yield duration: 10 millis
  max concurrent threads: 1
FlowFile Repository:
  partitions: 256
  checkpoint interval: 2 mins
  always sync: false
  Swap:
    threshold: 20000
    in period: 5 sec
    in threads: 1
    out period: 5 sec
    out threads: 4
Content Repository:
  content claim max appendable size: 10 MB
  content claim max flow files: 100
  always sync: false
Provenance Repository:
  provenance rollover time: 1 min
Component Status Repository:
  buffer size: 1440
  snapshot frequency: 1 min
Security Properties:
  keystore: ''
  keystore type: ''
  keystore password: ''
  key password: ''
  truststore: ''
  truststore type: ''
  truststore password: ''
  ssl protocol: ''
  Sensitive Props:
    key: ''
    algorithm: PBEWITHMD5AND256BITAES-CBC-OPENSSL
    provider: BC
Processors: []
Process Groups: []
Funnels: []
Connections: []
Remote Process Groups: []
NiFi Properties Overrides: {}


avatar
Expert Contributor

Hi @apsaltis

I figured out where i went wrong. The config.sh transform command wasnt working as i was on a windows machine. I used config.bat and its working fine now

avatar
Contributor

Thanks for the useful tutorial Marc!

avatar
Contributor

I am not able to move beyond step 6. After creating the remote process group, I am getting an error 'http://127.0.0.1:8080/nifi' does not have any input ports.