Created on 07-25-202001:52 AM - edited on 08-02-202011:19 PM by VidyaSargur
There will be multiple form factors available in the future, for now, I will assume you have an environment that contains 1 datahub with NiFi, and 1 Data Hub with both Impala and Kudu. (The answer still works if all are on the same datahub).
Prerequisites
Data Hubs with NiFi and Impala+Kudu
Permission to access these (e.g. add a processor, create table)
Know your Workload User Name (CDP Management Console > Your name (bottom left) > Profile)
You should have set your Workload Password in the same location
Steps to write data from NiFi to Kudu in CDP Public Cloud
Unless mentioned otherwise, I have kept everything to its default settings.
In Kudu Data Hub Cluster:
Gather the FQDN links of the brokers and the used ports. Go to the Data Hub > Click Kudu Master > Click Masters.
Combine the RPC addresses together in the following format:
Run the following query to create the kudu table (the little triangle makes it run):
`CREATE TABLE default.micro (id BIGINT, name STRING, PRIMARY KEY(id)) STORED AS KUDU;`
In NiFi GUI:
Ensure that you have some data in NiFi to that fits in the table. Use the `GenerateFlowFile` processor.
In Properties, configure the Custom Text to contain the following (copy carefully or use shift+enter for newline):
id, name
1,dennis
Select the PutKudu processor, configure it as the following:
- Settings
- Automatically terminate relationships: Tick both success and faillure
- Scheduling
- 1 sec
- Properties
- Kudu Masters: The combined list we created earlier
- Table Name: impala::default.micro
- Kerberos Principal: your Workload User Name (see prerequisites above)
- Kerberos Password: your Workload Password
- Record Reader: Create new service>CSV Reader
- Kudu Operation Type: UPSERT
Right-click the Nifi Canvas> Configure > The little cogwheel bolt of CSV Reader, set the following property and then apply:
Treat First Line as Header: True
Click the little lightning bolt of CSV Reader > Enable
On the canvas connect your GenerateFlowFile processor to your PutKudu processor and start the flow.
You should now be able to select your table through HUE and see that a single record has been added:
`select * from default.micro`
These are the minimal steps, a more extensive explanation can be found on in the Cloudera Documentation.
Potential refinements
A describe extended also exposes the hostnames. For me, this also worked, but I am unsure how safe it is not to explicitly define the ports.