- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 09-13-2016 09:45 PM
Getting started with MiNiFi
In this tutorial, we will learn how configure MiNiFi to send data to NiFi:
- Installing HDF
- Installing MiNiFi
- Setting up the Flow for NiFi
- Setting up the Flow for MiNiFi
- Preparing the flow for MiNiFi
- Configuring and starting MiNiFi
- Enjoying the data flow!
References
For a primer on HDF, you can refer to the tutorials here Tutorials and User Guides
Installing HDF
- If you do not have NiFi installed, please follow the instructions found here
NOTE: The above installation guide is for HDF 1.2.0.1, this is the version that matches Apache MiNiFi 0.0.1. Although HDF 2.0 may work, for this exercise -- it is not recommended at this time.
Installing MiNiFi
Now that you have NiFi up and running it is time to download and install MiNiFi.
- Open a browser. Download the MiNiFi Binaries from Apache MiNiFi Downloads Page. There are two options: tar.gz a format tailored to Linux and a zip file more compatible with Windows. If you are using a Mac either option is just fine.
Figure 1. MiNiFi download page
For this tutorial I have downloaded the tar.gz on a Mac as shown above in
- To install MiNiFi, extract the files from the compressed file to a location in which you want to run the application. I have chosen to install it to
/Users/apsaltis/minifi-to-nifi
The image below show MiNiFi downloaded and installed in this directory:
Setting up the Flow for NiFi
NOTE: Before starting NiFi we need to enable Site-to-Site communication. To do that do the following:
- Open <$NIFI_INSTALL_DIR>/conf/nifi.properties in your favorite editor
- Change:nifi.remote.input.socket.host=nifi.remote.input.socket.port=nifi.remote.input.secure=true
To
- Restart NiFi if it was running
Now that we have NiFi up and running and MiNiFi installed and ready to go, the next thing to do is to create our data flow. To do that we are going to first start with creating the flow in NiFi. Remember if you do not have NiFi running execute the following command:
Now we should be ready to create our flow. To do this do the following:
- Open a browser and go to: http://\:\/nifi on my machine that url looks is http://127.0.0.1:8080/nifi and going to it in the browser looks like this:
Figure 2. Empty NiFi Canvas
- The first thing we are going to do is setup an Input Port. This is the port that MiNiFi will be sending data to. To do this drag the Input Port icon to the canvas and call it "From MiNiFi" as show below in figure 3.
Figure 3. Adding the Input Port
- Now that the Input Port is configured we need to have somewhere for the data to go once we receive it. In this case we will keep it very simple and just log the attributes. To do this drag the Processor icon to the canvas and choose the LogAttribute processor as shown below in figure 4.
Figure 4. Adding the LogAttribute processor
- Now that we have the input port and the processor to handle our data, we need to connect them. After creating the connection your data flow should look like figure 5 below.
Figure 5. NiFi Flow
- We are now ready to build the MiNiFi side of the flow. To do this do the following:
- Add a GenerateFlowFile processor to the canvas (don't forget to configure the properties on it)
- Add a Remote Processor Group to the canvas as shown below in Figure 6
Figure 6. Adding the Remote Processor Group
- For the URL copy and paste the URL for the NiFi UI from your browser
- Connect the GenerateFlowFile to the Remote Process Group as shown below in figure 7. (You may have to refresh the Remote Processor Group, before the input port will be available)
Figure 7. Adding GenerateFlowFile Connection to Remote Processor Group
- Your canvas should now look similar to what is shown below in figure 8.
Figure 8. Adding GenerateFlowFile Connection to Remote Processor Group
- There is one last step we need to take before we can export the template. We need to make sure that we set the back pressure between the GenerateFlowFile processor and the Remote Process Group (RPG). That way if you stop NiFi and not MiNiFi you will not fill-up the hard drive where MiNiFi is running. To set the back pressure do the following:
- Right-click on the "From MiNiFi" connection and choose "Configure"
- Choose the "Settings" tab
- Set the "Back pressure object threshold" and "Back pressure data size threshold" to 10000 and 1 GB respectively.
- The next step is to generate the flow we need for MiNiFi. To do this do the following steps:
- Create a template for MiNiFi illustrated below in figure 9.
Figure 9. Creating a template
- Select the GenerateFlowFile and the NiFi Flow Remote Processor Group (these are the only things needed for MiMiFi)
- Select the "Create Template" button from the toolbar
- Choose a name for your template
- Create a template for MiNiFi illustrated below in figure 9.
- We now need to save our template, as illustrated below in figure 10.
Figure 10. Template button
- Now we need to download the template as shown below in figure 11
Figure 11. Saving a template
- We are now ready to setup MiNiFi. However before doing that we need to convert the template to YAML format which MiNiFi uses. To do this we need to do the following:
- Navigate to the minifi-toolkit directory (minifi-toolkit-0.0.1)
- Transform the template that we downloaded using the following command:
bin/config.sh transform <INPUT_TEMPLATE> <OUTPUT_FILE>
For example:
bin/config.sh transform MiNiFi_Flow.xml config.yml
- Next copy the
config.yml
to theminifi-0.0.1/conf
directory. That is the file that MiNiFi uses to generate the nifi.properties file and the flow.xml.gz for MiNiFi. - That is it, we are now ready to start MiNiFi. To start MiNiFi from a command prompt execute the following:cd <MINIFI_INSTALL_DIR>bin/minifi.sh start
You should be able to now go to your NiFi flow and see data coming in from MiNiFi.
Created on 11-02-2016 06:26 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
I might suggest we make a few changes to this article:
1. The link you have for installing HDF talks about installing HDF 2.0. HDF 2.0 is based off Apache NiFi 1.0. Since MiNiFi is built from Apache NiFi 0.6.1, the dataflows built and templated for conversion into MiNiFi YAML files must also be built using an Apache 0.6 based NiFi install. (I see in your example above you did just that but this needs to be made clear)
2. I would never recommend setting nifi.remote.input.socket.host= to "localhost". When a NiFi or MiNiFi connects to another NiFi via S2S, the destination NiFi will return the value set for this property along with the value set for nifi.remote.input.socket.port=. In your example that means the source MiNiFi would then try to send FlowFiles to localhost:10000. This is ONLY going to work if the destination NIFi is located on the same server as MiNiFi.
3. You should also explain why you are changing nifi.remote.input.secure= from true to false. Changing this is not a requirement of MiNiFi, it is simply a matter of preference (If set to true, both MiNiFi (source) and NiFi (destination) must be setup to run securely over https). In your example you are working with http only.
4. While doable, one should never route the "success" relationship from any processor back on to itself. If you have reached the end of your dataflow, you should auto-terminate the "success" relationship.
5. I am not clear what you are telling me to do based on this line under step 5:
- Start the From MiNiFi Input Port
6. When using the GenerateFlowFile processor in an example flow it is important to recommend that user set a run schedule other then "0 sec". Since MiNiFi is Apache 0.6.1 based there is no default backpressure on connections and with a run schedule of "0 sec" it is very likely this processor will produce FlowFiles much faster then they can be sent across S2S. This will eventual fill the hard drive of the system running MiNiFi. An even better recommendation would be to make sure they set back pressure between the GenerateFlowFile processor and the Remote Process Group (RPG). That way even if someone stops the NiFi and not the MiNiFi they don't fill their MiNiFI hard drive.
Thanks,
Matt
Created on 11-03-2016 12:18 AM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Thanks for the feedback @mclark all of your suggestions should now be apparent in the content. Thanks again for the input.
Created on 05-25-2017 09:32 AM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Created on 05-25-2017 10:53 AM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
@Roger Young -- You are correct once the template is downloaded you should be able to delete it and the related part of the flow. I am not sure off-hand why you are not seeing data flow form MiNiFi. I am actually using this exercise in a class I am teaching today and will be sure to test this and see what happens.
What version of MiNiFi and NiFi are you using?
Created on 05-25-2017 11:13 AM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
hi, i am using nifi-1.2.0 and minifi-0.2.0.
Created on 05-25-2017 11:16 AM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Below is the config.yml file in minifi. There is no mention of processors there, maybe ive done something wrong
# Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the \"License\"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an \"AS IS\" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. MiNiFi Config Version: 3 Flow Controller: name: MiNiFi Flow comment: '' Core Properties: flow controller graceful shutdown period: 10 sec flow service write delay interval: 500 ms administrative yield duration: 30 sec bored yield duration: 10 millis max concurrent threads: 1 FlowFile Repository: partitions: 256 checkpoint interval: 2 mins always sync: false Swap: threshold: 20000 in period: 5 sec in threads: 1 out period: 5 sec out threads: 4 Content Repository: content claim max appendable size: 10 MB content claim max flow files: 100 always sync: false Provenance Repository: provenance rollover time: 1 min Component Status Repository: buffer size: 1440 snapshot frequency: 1 min Security Properties: keystore: '' keystore type: '' keystore password: '' key password: '' truststore: '' truststore type: '' truststore password: '' ssl protocol: '' Sensitive Props: key: '' algorithm: PBEWITHMD5AND256BITAES-CBC-OPENSSL provider: BC Processors: [] Process Groups: [] Funnels: [] Connections: [] Remote Process Groups: [] NiFi Properties Overrides: {}
Created on 05-25-2017 01:09 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Hi @apsaltis
I figured out where i went wrong. The config.sh transform command wasnt working as i was on a windows machine. I used config.bat and its working fine now
Created on 06-06-2017 06:48 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Thanks for the useful tutorial Marc!
Created on 12-11-2017 08:06 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
I am not able to move beyond step 6. After creating the remote process group, I am getting an error 'http://127.0.0.1:8080/nifi' does not have any input ports.