Community Articles

Find and share helpful community-sourced technical articles.
avatar
Rising Star

Moving data from your local machine to the cloud has never been easier using NiFi site to site protocol and CDP Datahub. In this article, I will focus on how to set up a site to site communication between your local machine and CDP Cloud, without using the default Knox CDP Proxy.

 

This configuration assumes that you already have a local instance of NiFi (or MiNiFi) and a CDP Datahub Cluster running NiFi. If you want to learn how to use CDP Public Cloud, please visit our overview page and documentation.

 

This setup will be executed in 4 steps:

  • Step 1: Open CDP to your local IP
  • Step 2: Download and configure stores on your local machine
  • Step 3: Configure a simple site-to-site flow
  • Step 4: Authorize this flow in Ranger

Step 1: Open CDP to your local IP

  1. Go to your CDP Management Console, and find your datahub (here pvn-nifi).
  2. At the bottom of the datahub page, click on Hardware and locate one of the instances running NiFi:Screen Shot 2020-08-10 at 4.03.51 PM.png
  3. Click on the instances and you will be redirected to your cloud provider (here AWS😞Screen Shot 2020-08-10 at 4.06.00 PM.png
  4. At the bottom of the screen, click on the security group associated with your instance, and you will be redirected to that security group config page:Screen Shot 2020-08-10 at 4.06.23 PM.png
  5. Click on Edit inbound rules and add a rule opening TCP port 8443 to your local IP:Screen Shot 2020-08-10 at 4.06.38 PM.png
  6. Save these changes. 

Step 2: Download and configure stores on your local machine

  1. Connect to one of the NiFi machines with the Cloudbreak user and the key you used at deployment:
    $ ssh -i [path_to_private_key] cloudbreak@[your_nifi_host]​
  2. Copy and authorize the key and trust stores:
    $ sudo su
    $ cp /var/lib/cloudera-scm-agent/agent-cert/cm-auto-host_keystore.jks /tmp
    $ cp /var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks /tmp
    $ chmod a+rw /tmp/cm-auto-host_keystore.jks
    $ chmod a+rw /tmp/cm-auto-global_truststore.jks​
  3. Disconnect from the remote machine and copy these stores:
    $ cd ~/Desktop
    $ scp -i [path_to_private_key] cloudbreak@[your_nifi_host]:/tmp/cm-auto-host_keystore.jks cm-auto-host_keystore.jks 
    $ scp -i [path_to_private_key] cloudbreak@[your_nifi_host]:/tmp/cm-auto-global_truststore.jks cm-auto-global_truststore.jks ​
  4. Configure your local NiFi with these stores, by modifying your nifi.properties
    nifi.security.keystore=/Users/pvidal/Desktop/cm-auto-host_keystore.jks
    nifi.security.keystoreType=JKS
    nifi.security.keystorePasswd=[keystore_pw]
    nifi.security.keyPasswd=[keystore_pw]
    nifi.security.truststore=/Users/pvidal/Desktop/cm-auto-global_truststore.jks
    nifi.security.truststoreType=JKS
    nifi.security.truststorePasswd=[truststore_pw]
    Note: To know the passwords of these stores, please connect with your Cloudera team.
  5. Restart your local NiFi instance:
    nifi restart​

Step 3: Configure a simple site-to-site flow

Local instance

  1. Create a process group to host your flow (here called S2S Cloud:Screen Shot 2020-08-10 at 4.29.21 PM.png
  2. In this process group, create a remote process group instance and configure it with one of your cloud NiFi instances address, and the HTTP protocol:Screen Shot 2020-08-10 at 4.31.21 PM.png
  3. Create a simple Generate flow file processor and connect it to the remote processor:
    Screen Shot 2020-08-10 at 4.53.41 PM.pngNote: Without configuring Ranger, you will get a Forbidden warning (see step 4).

CDP Public Instance

  1. Create a process group to host your flow (here called Receive from on prem):Screen Shot 2020-08-10 at 4.57.50 PM.png
  2. In this process group, create an input port accepting remote connections:Screen Shot 2020-08-10 at 4.58.32 PM.png
  3. Finally, create a flow that takes the data and logs it:Screen Shot 2020-08-10 at 4.58.11 PM.png
  4. Start your flow.

Step 4: Authorize this flow in Ranger

  1. From the Cloudera Management console, go to Ranger and your NiFi service:Screen Shot 2020-08-10 at 5.06.57 PM.png
  2. From the list of policies, create a new policy (here called s2s) that will allow access to your specific process group and the site-to-site protocol (Ranger does auto completion): Screen Shot 2020-08-10 at 5.07.11 PM.png
  3. Save this policy, and go back to your local machine; you can now enable the remote process group and start sending files!

Example of successful flows

Local Flow

Screen Shot 2020-08-10 at 5.11.51 PM.png

CDP Public Flow

Screen Shot 2020-08-10 at 5.12.12 PM.png

2,241 Views