Community Articles

Find and share helpful community-sourced technical articles.
Labels (2)
avatar

There will be multiple form factors available in the future, for now, I will assume you have an environment that contains 1 datahub with NiFi, and 1 Data Hub with both Impala and Kudu. (The answer still works if all are on the same datahub).

 

Prerequisites

  • Data Hubs with NiFi and Impala+Kudu
  • Permission to access these (e.g. add a processor, create table)
  •  Know your Workload User Name (CDP Management Console > Your name (bottom left) > Profile)
  • You should have set your Workload Password in the same location

Steps to write data from NiFi to Kudu in CDP Public Cloud

Unless mentioned otherwise, I have kept everything to its default settings.

In Kudu Data Hub Cluster:

  1. Gather the FQDN links of the brokers and the used ports. Go to the Data Hub > Click Kudu Master > Click Masters.
  2. Combine the RPC addresses together in the following format: 
    host1:port,host2:port,host3:port 
    
    Example:
    
    ```master1fqdn.abc:7051,master2fqdn.abc:7051,master3fqdn.abc:7051```​

In HUE on Impala/Kudu Data Hub Cluster:

Run the following query to create the kudu table (the little triangle makes it run):

`CREATE TABLE default.micro (id BIGINT, name STRING, PRIMARY KEY(id)) STORED AS KUDU;`

In NiFi GUI:

  1. Ensure that you have some data in NiFi to that fits in the table. Use the `GenerateFlowFile` processor.
  2. In Properties, configure the Custom Text to contain the following (copy carefully or use shift+enter for newline):
    id, name
    1,dennis​
  3. Select the PutKudu processor, configure it as the following:
    - Settings
    - Automatically terminate relationships: Tick both success and faillure
    - Scheduling
    - 1 sec
    - Properties
    - Kudu Masters: The combined list we created earlier
    - Table Name: impala::default.micro
    - Kerberos Principal: your Workload User Name (see prerequisites above)
    - Kerberos Password: your Workload Password
    - Record Reader: Create new service>CSV Reader
    - Kudu Operation Type: UPSERT
    ​
  4. Right-click the Nifi Canvas> Configure > The little cogwheel bolt of CSV Reader, set the following property and then apply:
     Treat First Line as Header: True​
  5. Click the little lightning bolt of CSV Reader > Enable
  6. On the canvas connect your GenerateFlowFile processor to your PutKudu processor and start the flow.
  7.  You should now be able to select your table through HUE and see that a single record has been added:
    `select * from default.micro`​

These are the minimal steps, a more extensive explanation can be found on in the Cloudera Documentation.

Potential refinements

A describe extended also exposes the hostnames. For me, this also worked, but I am unsure how safe it is not to explicitly define the ports.

1,410 Views
0 Kudos