Community Articles

Find and share helpful community-sourced technical articles.
Rising Star

Moving data from your local machine to the cloud has never been easier using NiFi site to site protocol and CDP Datahub. In this article, I will focus on how to set up a site to site communication between your local machine and CDP Cloud, without using the default Knox CDP Proxy.


This configuration assumes that you already have a local instance of NiFi (or MiNiFi) and a CDP Datahub Cluster running NiFi. If you want to learn how to use CDP Public Cloud, please visit our overview page and documentation.


This setup will be executed in 4 steps:

  • Step 1: Open CDP to your local IP
  • Step 2: Download and configure stores on your local machine
  • Step 3: Configure a simple site-to-site flow
  • Step 4: Authorize this flow in Ranger

Step 1: Open CDP to your local IP

  1. Go to your CDP Management Console, and find your datahub (here pvn-nifi).
  2. At the bottom of the datahub page, click on Hardware and locate one of the instances running NiFi:Screen Shot 2020-08-10 at 4.03.51 PM.png
  3. Click on the instances and you will be redirected to your cloud provider (here AWS😞Screen Shot 2020-08-10 at 4.06.00 PM.png
  4. At the bottom of the screen, click on the security group associated with your instance, and you will be redirected to that security group config page:Screen Shot 2020-08-10 at 4.06.23 PM.png
  5. Click on Edit inbound rules and add a rule opening TCP port 8443 to your local IP:Screen Shot 2020-08-10 at 4.06.38 PM.png
  6. Save these changes. 

Step 2: Download and configure stores on your local machine

  1. Connect to one of the NiFi machines with the Cloudbreak user and the key you used at deployment:
    $ ssh -i [path_to_private_key] cloudbreak@[your_nifi_host]​
  2. Copy and authorize the key and trust stores:
    $ sudo su
    $ cp /var/lib/cloudera-scm-agent/agent-cert/cm-auto-host_keystore.jks /tmp
    $ cp /var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks /tmp
    $ chmod a+rw /tmp/cm-auto-host_keystore.jks
    $ chmod a+rw /tmp/cm-auto-global_truststore.jks​
  3. Disconnect from the remote machine and copy these stores:
    $ cd ~/Desktop
    $ scp -i [path_to_private_key] cloudbreak@[your_nifi_host]:/tmp/cm-auto-host_keystore.jks cm-auto-host_keystore.jks 
    $ scp -i [path_to_private_key] cloudbreak@[your_nifi_host]:/tmp/cm-auto-global_truststore.jks cm-auto-global_truststore.jks ​
  4. Configure your local NiFi with these stores, by modifying your[keystore_pw][keystore_pw][truststore_pw]
    Note: To know the passwords of these stores, please connect with your Cloudera team.
  5. Restart your local NiFi instance:
    nifi restart​

Step 3: Configure a simple site-to-site flow

Local instance

  1. Create a process group to host your flow (here called S2S Cloud:Screen Shot 2020-08-10 at 4.29.21 PM.png
  2. In this process group, create a remote process group instance and configure it with one of your cloud NiFi instances address, and the HTTP protocol:Screen Shot 2020-08-10 at 4.31.21 PM.png
  3. Create a simple Generate flow file processor and connect it to the remote processor:
    Screen Shot 2020-08-10 at 4.53.41 PM.pngNote: Without configuring Ranger, you will get a Forbidden warning (see step 4).

CDP Public Instance

  1. Create a process group to host your flow (here called Receive from on prem):Screen Shot 2020-08-10 at 4.57.50 PM.png
  2. In this process group, create an input port accepting remote connections:Screen Shot 2020-08-10 at 4.58.32 PM.png
  3. Finally, create a flow that takes the data and logs it:Screen Shot 2020-08-10 at 4.58.11 PM.png
  4. Start your flow.

Step 4: Authorize this flow in Ranger

  1. From the Cloudera Management console, go to Ranger and your NiFi service:Screen Shot 2020-08-10 at 5.06.57 PM.png
  2. From the list of policies, create a new policy (here called s2s) that will allow access to your specific process group and the site-to-site protocol (Ranger does auto completion): Screen Shot 2020-08-10 at 5.07.11 PM.png
  3. Save this policy, and go back to your local machine; you can now enable the remote process group and start sending files!

Example of successful flows

Local Flow

Screen Shot 2020-08-10 at 5.11.51 PM.png

CDP Public Flow

Screen Shot 2020-08-10 at 5.12.12 PM.png