Created on 08-10-202002:14 PM - edited on 08-14-202012:49 AM by VidyaSargur
Moving data from your local machine to the cloud has never been easier using NiFi site to site protocol and CDP Datahub. In this article, I will focus on how to set up a site to site communication between your local machine and CDP Cloud, without using the default Knox CDP Proxy.
This configuration assumes that you already have a local instance of NiFi (or MiNiFi) and a CDP Datahub Cluster running NiFi. If you want to learn how to use CDP Public Cloud, please visit our overview page and documentation.
This setup will be executed in 4 steps:
Step 1: Open CDP to your local IP
Step 2: Download and configure stores on your local machine
Step 3: Configure a simple site-to-site flow
Step 4: Authorize this flow in Ranger
Step 1: Open CDP to your local IP
Go to your CDP Management Console, and find your datahub (here pvn-nifi).
At the bottom of the datahub page, click on Hardware and locate one of the instances running NiFi:
Click on the instances and you will be redirected to your cloud provider (here AWS😞
At the bottom of the screen, click on the security group associated with your instance, and you will be redirected to that security group config page:
Click on Edit inbound rules and add a rule opening TCP port 8443 to your local IP:
Save these changes.
Step 2: Download and configure stores on your local machine
Connect to one of the NiFi machines with the Cloudbreak user and the key you used at deployment:
Note: To know the passwords of these stores, please connect with your Cloudera team.
Restart your local NiFi instance:
nifi restart
Step 3: Configure a simple site-to-site flow
Local instance
Create a process group to host your flow (here called S2S Cloud:
In this process group, create a remote process group instance and configure it with one of your cloud NiFi instances address, and the HTTP protocol:
Create a simple Generate flow file processor and connect it to the remote processor: Note: Without configuring Ranger, you will get a Forbidden warning (see step 4).
CDP Public Instance
Create a process group to host your flow (here called Receive from on prem):
In this process group, create an input port accepting remote connections:
Finally, create a flow that takes the data and logs it:
Start your flow.
Step 4: Authorize this flow in Ranger
From the Cloudera Management console, go to Ranger and your NiFi service:
From the list of policies, create a new policy (here called s2s) that will allow access to your specific process group and the site-to-site protocol (Ranger does auto completion):
Save this policy, and go back to your local machine; you can now enable the remote process group and start sending files!