Community Articles

VidyaSargur · ‎07-25-2020

There will be multiple form factors available in the future, for now, I will assume you have an environment that contains 1 datahub with NiFi, and 1 Data Hub with both Impala and Kudu. (The answer still works if all are on the same datahub).

Prerequisites

Data Hubs with NiFi and Impala+Kudu
Permission to access these (e.g. add a processor, create table)
Know your Workload User Name (CDP Management Console > Your name (bottom left) > Profile)
You should have set your Workload Password in the same location

Steps to write data from NiFi to Kudu in CDP Public Cloud

Unless mentioned otherwise, I have kept everything to its default settings.

In Kudu Data Hub Cluster:

Gather the FQDN links of the brokers and the used ports. Go to the Data Hub > Click Kudu Master > Click Masters.

Combine the RPC addresses together in the following format:

host1:port,host2:port,host3:port 

Example:

```master1fqdn.abc:7051,master2fqdn.abc:7051,master3fqdn.abc:7051```

In HUE on Impala/Kudu Data Hub Cluster:

Run the following query to create the kudu table (the little triangle makes it run):

`CREATE TABLE default.micro (id BIGINT, name STRING, PRIMARY KEY(id)) STORED AS KUDU;`

In NiFi GUI:

Ensure that you have some data in NiFi to that fits in the table. Use the `GenerateFlowFile` processor.
In Properties, configure the Custom Text to contain the following (copy carefully or use shift+enter for newline):
```
id, name
1,dennis
```

Select the PutKudu processor, configure it as the following:

- Settings
- Automatically terminate relationships: Tick both success and faillure
- Scheduling
- 1 sec
- Properties
- Kudu Masters: The combined list we created earlier
- Table Name: impala::default.micro
- Kerberos Principal: your Workload User Name (see prerequisites above)
- Kerberos Password: your Workload Password
- Record Reader: Create new service>CSV Reader
- Kudu Operation Type: UPSERT

Right-click the Nifi Canvas> Configure > The little cogwheel bolt of CSV Reader, set the following property and then apply:
```
 Treat First Line as Header: True
```
Click the little lightning bolt of CSV Reader > Enable
On the canvas connect your GenerateFlowFile processor to your PutKudu processor and start the flow.
You should now be able to select your table through HUE and see that a single record has been added:
```
`select * from default.micro`
```

These are the minimal steps, a more extensive explanation can be found on in the Cloudera Documentation.

Potential refinements

A describe extended also exposes the hostnames. For me, this also worked, but I am unsure how safe it is not to explicitly define the ports.

Cloudera Community