Community Articles

Find and share helpful community-sourced technical articles.
avatar

HDF Overview

Overview

Hortonworks DataFlow (HDF) powered by Apache NiFi, Kafka and Storm, collects, curates, analyzes and delivers real-time data from the IoAT to data stores both on-premises and in the cloud. This is the quick installation guide to install Apache NiFi on AWS EC2 instance. Please refer this document as supplement guide to official Hortonworks HDF documentation.

Prerequisites

Before you install Apache NiFi on AWS, make sure

Installation Steps.

The screenshots in this section detail the setup and configuration of Apache NiFi on EC2 instance.

Refer the NiFi Admin Guide for the System requirements. This document covers installation on a Redhat linux (64 bit) EC2 instance.

Login to AWS and launch the EC2 instance and OS of your choice. (Please make sure the selected OS is supported by NiFi). The current exercise uses the Red Hat Enterprise Linux 7.2 image (HDF EC2 Instance).

6161-screen-1-1.png

Make sure you keep the security private key safe. Under Network and Security configuration, open the Http ports (e.g. 8081 & 8082 shown below) to access the NiFi web interface and for the site to site protocol to exchange data between multiple NiFi instances.

6162-screen-2.png

  • Download the HDF from HDF Download Page. Either you can download it directly on your EC2 instance or you can upload the zip file to the EC2 instance from local using scp.

e.g. scp -i HDF.pem HDF-1.2.0.1-1.zip ec2-user@<public-dns-hostname>:/home/ec2-user

where HDF.pem is private key.

  • Make sure you install the latest java and unzip on EC2

sudo yum install unzip

sudo yum install java

  • Decompress/Unzip zip into desired installation directory.
  • Make desired edits in nifi.properties file under <install_dir>/nifi/conf.

e.g. update the site-to-site properties to include the following

nifi.remote.input.socket.host=<public_dns_hostname>

nifi.remote.input.socket.port=8082

nifi.remote.input.secure=false

  • From the <install_dir>/nifi/bin directory execute the following commands by ./nifi.sh <command>
    • start: starts NiFi in the background
      • stop: stops NiFi that is running in the background
      • status: provides the current status of NiFi
      • run: runs NiFi in the foreground and waits for a Ctrl-C to initiate shutdown of NiFi
      • install: installs NiFi as a service that can then be controlled via
        • service nifi start
        • service nifi stop
        • service nifi status
  • The following screenshots displays the NiFi running on EC2 instance with the sample dataflow.

6150-screen-3.png

Benefits

  • Running a NiFi instance in AWS provides an easy to use, flexible and cost effective dataflow management solution in cloud.
  • NiFi is a reliable, secure and scalable solution which gets additional benefits of AWS’ mature infrastructure solution.
  • Using the NiFi site-to-site protocol eliminates the need to run software in the DMZ when exchanging data between on-prem and cloud.

Document References

NiFi System Admin Guide:

http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDF1/HDF-1-trunk/bk_AdminGuide/content/ch_a...

10,571 Views
Comments
avatar
Expert Contributor

@milind pandit

Hi Milind - great overview.

Any recommendations around the type of instance we should use?

avatar
Expert Contributor

Is this article still valid for HDF version 3.0 which was released recently? Are there easier ways of deploying to Amazon?