Apache NiFi is an easy to use, powerful, and reliable system to process and distribute data.
Apache NiFi is based on technology previously called “Niagara Files” that was in development and used at scale within the NSA for the last 8 years and was made available to the Apache Software Foundation through the NSA Technology Transfer Program. Some of the use cases include, but are not limited to:
Big Data Ingest– Offers a simple, reliable and secure way to collect data streams.
IoAT Optimization– Allows organizations to overcome real world constraints such as limited or expensive bandwidth while ensuring data quality and reliability.
Compliance– Enables organizations to understand everything that happens to data in motion from its creation to its final resting place, which is particularly important for regulated industries that must retain and report on chain of custody.
Digital Security– Helps organizations collect large volumes of data from many sources and prioritize which data is brought back for analysis first, a critical capability given the time sensitivity of identifying security breaches.Source
Installation : Download , untar or unzip the package and modify conf/nifi.properties. I added nifi host and modified the port from 8080 to 9080 or deploy NiFi ambari service by using this
We are going to work on 3 use cases. Part 1 is focusing very basic use case.
1) Copy files from local filesystem into HDFS
Processor - Remember this word because we will be playing with tons of processors while working on use cases. You will "drag" Processor on to the canvas. filter by "getfile" and click Add & then search "hdfs" for put.
Now , we have GetFile and PutFile on to the canvas. Right click on the processor to see all the options. In this case, I am copying the data from /landing into HDFS /sourcedata. Right Click on the GetFile processor and it will give you the configuration option. Input directory /landing and in my case , I am keeping source file false.
Now, let's configure PutHDFS. Add complete location of core-site.xml and hdfs-site.xml as shown below. You can label the processor as you like by clickingSettings and also, enable failure and success
Now, let's setup the relationship between Get and Put. Drag that arrow with + sign to PutHDFS
The following screenshot is from my demo environment.