Member since
04-05-2016
139
Posts
143
Kudos Received
16
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
32746 | 02-14-2019 02:53 PM | |
2511 | 01-04-2019 08:39 PM | |
10742 | 11-05-2018 03:38 PM | |
5015 | 09-27-2018 04:21 PM | |
2732 | 07-05-2018 02:56 PM |
05-23-2018
03:31 PM
Nice thanks, I figured there had to be a way to tell it that it was a solo node but I just wasn't phrasing it right for google apparently. Though the problem ended up being solved with a simple delete/reinstall.
... View more
04-09-2018
05:56 PM
3 Kudos
Objective
To import a versioned flow or revert local changes in a versioned flow, a user must have access to all the components in the versioned flow. As such, it is recommended that restricted components are created at the root process group level if they are to be utilized in versioned flows. This tutorial illustrates the benefits of this configuration and demonstrates a new feature introduced in Apache NiFi 1.6.0: granular restricted component categories (NIFI-4885). Users can be given access to all restricted components or to specific categories of restricted components.
Note: This tutorial assumes you are familiar with setting up a secure Apache NiFi instance and integrating it with a secure Apache NiFi Registry. Environment
This tutorial was tested using the following environment and components:
Mac OS X 10.11.6
Apache NiFi 1.6.0
Apache NiFi Registry 0.1.0 User Setup
Assume the following: There are two users, "sys_admin" and "test_user" who have access to both view and modify the root process group. "sys_admin" has access to all restricted components.
"test_user" has access to restricted components requiring 'read filesystem' and 'write filesystem'. Restricted Controller Service Created in Root Process Group
In this first example, sys_admin creates a KeytabCredentialsService controller service (NIFI-4917) at the root process group level: KeytabCredentialService controller service is a restricted component that requires 'access keytab' permissions:
Sys_admin creates a process group ABC containing a flow with GetFile and PutHDFS processors:
GetFile processor is a restricted component that requires 'write filesystem' and 'read filesystem' permissions:
PutHDFS is a restricted component that requires 'write filesystem' permissions:
The PutHDFS processor is configured to use the root process group level KeytabCredentialsService controller service:
Sys_admin saves the process group as a versioned flow:
Test_user changes the flow by removing the KeytabCredentialsService controller service:
If test_user chooses to revert this change:
the revert is successful:
Additionally, if test_user chooses to import the ABC versioned flow:
The import is successful: Restricted Controller Service Created in Process Group
Now, consider a second scenario where the controller service is created on the process group level.
Sys_admin creates a process group XYZ:
Sys_admin creates a KeytabCredentialsService controller service at the process group level:
The same GetFile and PutHDFS flow is created in the process group:
However, PutHDFS now references the process group level controller service:
Sys_admin saves the process group as a versioned flow.
Test_user changes the flow by removing the KeytabCredentialsService controller service. However, with this configuration, if test_user attempts to revert this change:
the revert is unsuccessful because test_user does not have the 'access keytab' permissions required by the KeytabCredentialService controller service:
Similarly, if test_user tries to import the XYZ versioned flow:
The import fails:
... View more
Labels:
03-14-2018
03:30 PM
Looks like this is being handled/answered in a different but related question: https://community.hortonworks.com/questions/177353/i-am-a-newbie-in-nifi-i-am-using-nifi-in-docker-ho.html?childToView=176700#answer-176700
... View more
03-14-2018
02:56 PM
Hi @Akananda Singhania, I suspect your network configuration on your Docker Engine host is incorrect. Running the image you listed works as anticipated in a few of the environments available to me. Let's try to confirm this suspicion by running the following: docker run busybox ping -c 1 files.grouplens.org
You should receive output similar to the following. If not, the configured DNS server is not appropriately routing to external sites. PING files.grouplens.org (128.101.34.235): 56 data bytes
64 bytes from 128.101.34.235: seq=0 ttl=37 time=39.263 ms
--- files.grouplens.org ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 39.263/39.263/39.263 ms Could you provide more details about your environment in which you are running Docker? Of interest would be the output of cat /etc/resolv.conf Another option is to try explicitly specifying a DNS server such as those that Google makes available via a command such as: docker run --dns 8.8.8.8 -d -p 8080:8080 apache/nifi
... View more
04-10-2019
06:24 PM
Hi @@dhieru singh Thank you for the post , but I am unable to get the values of $.component.backPressureObjectThreshold $.status.aggregateSnapshot.flowFilesQueued after processing the EvaluateJsonPath.
... View more
02-08-2018
01:11 AM
6 Kudos
Objective
This tutorial walks you through how to install and secure a NiFi Registry using client certificates. A quick example of modifying user privileges in the Registry is also included. A video version of this tutorial can be seen here: https://youtu.be/qD03ao3R-a4
Note: To learn the basics of setting up an unsecured Registry and integrating with Apache NiFi see the HCC article Versioned DataFlows with Apache NiFi 1.5 and Apache NiFi Registry 0.1.0. Environment
This tutorial was tested using the following environment and components:
Mac OS X 10.11.6 Apache NiFi Registry 0.1.0 Apache NiFi Toolkit 1.5.0 Secure NiFi Registry Configuration Download & Extract Tarballs
Download the tarball for the 0.1.0 Registry release:
nifi-registry-0.1.0-bin.tar.gz
and the tarball for the 1.5.0 NiFi Toolkit:
nifi-toolkit-1.5.0-bin.tar.gz
Extract the tars:
tar xzvf nifi-registry-0.1.0-bin.tar.gz
tar xzvf nifi-toolkit-1.5.0-bin.tar.gz
Generate Configuration and Certificate Files
We will use the Apache NiFi TLS Toolkit to generate the necessary keystore, truststore, and client certificates. In this tutorial, we will create certs for two users: "sys_admin" and "test_user". The user “sys_admin” will have full access to the registry while “test_user” will be configured to have targeted access in the registry.
In the directory of your NiFi Toolkit install, run the following command:
./bin/tls-toolkit.sh standalone -n "localhost" -C "CN=sys_admin, OU=NIFI" -o target
Note: To see the usage information for the TLS Toolkit, run: ./bin/tls-toolkit.sh standalone -h .
TLS Toolkit generates the following in the
target directory:
CN=sys_admin_OU=NIFI.p12
CN=sys_admin_OU=NIFI.p12.password
localhost
nifi-cert.pem
nifi-key.key
The
localhost directory contains:
keystore.jks
nifi.properties
truststore.jks Registry Configuration
Copy the keystore and trustore to the
conf directory of your Registry install.
Copy the values of the keystore and truststore properties from the
nifi.properties file:
nifi.security.keystore=./conf/keystore.jks
nifi.security.keystoreType=jks
nifi.security.keystorePasswd=taceJshGdkyBRy4B7mwaSnM3AkbN7ffewjn3nVIGidw
nifi.security.keyPasswd=taceJshGdkyBRy4B7mwaSnM3AkbN7ffewjn3nVIGidw
nifi.security.truststore=./conf/truststore.jks
nifi.security.truststoreType=jks
nifi.security.truststorePasswd=WJwg6F2jmUcvpxRHDiseNRc/VV59WOS+SdrZ5amtnsE
into the values for the equivalent properties in the nifi-registry.properties file:
nifi.registry.security.keystore=./conf/keystore.jks
nifi.registry.security.keystoreType=jks
nifi.registry.security.keystorePasswd=taceJshGdkyBRy4B7mwaSnM3AkbN7ffewjn3nVIGidw
nifi.registry.security.keyPasswd=taceJshGdkyBRy4B7mwaSnM3AkbN7ffewjn3nVIGidw
nifi.registry.security.truststore=./conf/truststore.jks
nifi.registry.security.truststoreType=jks
nifi.registry.security.truststorePasswd=WJwg6F2jmUcvpxRHDiseNRc/VV59WOS+SdrZ5amtnsE
While you are in nifi-registry.properties , modify the HTTP and HTTPS web properties as follows:
nifi.registry.web.http.host=
nifi.registry.web.http.port=
nifi.registry.web.https.host=localhost
nifi.registry.web.https.port=18443
In the same Registry conf directory, modify authorizers.xml in two places. First in the userGroupProvider section, add the "sys_admin" DN to the "Initial Admin Identity 1" property:
<property name="Initial User Identity 1">CN=sys_admin, OU=NIFI</property>
Then in the accessPolicyProvider section, add the "sys_admin" DN to the "Initial Admin Identity" property:
<property name="Initial Admin Identity">CN=sys_admin, OU=NIFI</property>
Note: During this step, it is crucial that you specify the exact DN string used when the TLS Toolkit was invoked. A common error is entering "CN=sys_admin,OU=NIFI" which will not work as it has a missing space. Add Certificate to Keychain
Double-click on the .p12 file that was generated by the TLS Toolkit. When prompted, provide the password from the .password file.
Start the Registry
In a terminal window, navigate to the directory where NiFi Registry was installed and run:
./bin/nifi-registry.sh start
Open Registry UI
Navigate to the registry UI in your web browser (Chrome used in the following examples):
https://localhost:18443/nifi-registry
When prompted, select the "sys_admin" cert to add to your browser:
When prompted, enter your "login" keychain password:
You should now be able to view the Registry UI as the "CN=sys_admin, OU=NIFI" user:
Registry Administration
The "sys_admin" user has full access to the registry. Here are some examples of administration functions immediately available. Bucket Creation
Select the Settings icon (
) in the top right corner of the screen. In the Buckets window that appears, select the "New Bucket" button.
In the dialog that appears, enter the bucket name "ABC" and select the "Create" button.
The "ABC" bucket is created:
User Administration
Select "Users" at the top of the UI to access the user administration area of the Registry:
Select the pencil icon (
) next to the "CN=sys_admin, OU=NIFI" user. This will open a side nav that shows the Special Privileges and group Membership:
You can see that the "sys_admin" was given all special privileges as the Initial Admin Identity (IAI). The privileges for the IAI are not editable. Let's create a second user to see how bucket access can be restricted by modifying these privileges. Second User Creation
Close the side nav and select the "Add User" button.
Enter "CN=test_user, OU=NIFI" in the Identity field and select the "Add" button:
"CN=test_user", OU=NIFI" user is created:
Second User Certificate
Next we need a client certificate for "test_user".
Return to the directory of your NiFi Toolkit installation and run:
./bin/tls-toolkit.sh standalone -C "CN=test_user, OU=NIFI" -o target
NOTE:The output directory must be set to target in order for the existing CA certificate in that directory to be used.
TLS Toolkit generates the following additional files in the
target directory:
CN=test_user_OU=NIFI.p12
CN=test_user_OU=NIFI.p12.password
Add the .p12 cert to the Keychain as described earlier. However, choose a different browser this time to access the UI (Safari in the following examples):
https://localhost:18443/nifi-registry
Add the client certificate to the browser:
You should now be able to view the Registry UI as the "CN=test_user, OU=NIFI" user:
You can see that "test_user" has no access to Settings.
Return to the Chrome browser where "sys_admin" is the user. Give "test_user" read-only bucket privileges:
Return to the Safari browser where "test_user" is the user. Reload the browser. Select the Settings icon which is now available. The ABC bucket is now visible, but note that the Action to delete the bucket is not enabled, which is consistent with the privileges given to this user:
Additional Help
If you would like to learn more about NiFi Registry functionality and working with versioned flows in NiFi, see the following articles:
Versioned DataFlows with Apache NiFi 1.5 and Apache NiFi Registry 0.1.0 Apache NiFi - How do I deploy my flow?
Or documentation:
Apache NiFi Registry User Guide Apache NiFi Registry System Administrator's Guide Versioning a DataFlow (Apache NiFi User Guide)
... View more
Labels:
01-25-2018
08:10 PM
5 Kudos
Objective
This article highlights some of the latest UI enhancements added in Apache NiFi 1.5.0. Environment
The examples shown in the article utilized the following environment and components:
Mac OS X 10.11.6
Apache NiFi 1.5.0 "Primary Node" Processors Identification
In a NiFi Cluster, processors that have been configured for "Primary node" execution are now identified in the UI by a "P". On the canvas, the "P" is visible next to the processor icon:
The "P" is also shown in the Processors tab on the Summary page, specifically in the Name column: Finding Processors Quickly in the Summary Page
If your flow has hundreds of processors, it can be difficult differentiating between them in the Summary page (accessible from the top-right Global menu). On the Processors tab, a "Process Group" column has been added to display the name of the parent process group containing the component:
Additionally, when hovering over the "Go to location" button the tooltip now includes the path of the component. NiFi Registry Integration
NiFi 1.5.0 is the first release to integrate with the Apache NiFi Registry. NiFi dataflows can now be versioned on the process group level and easily deployed across different NiFi instances. More information can be found in the HCC article "Versioned DataFlows with Apache NiFi 1.5 and Apache NiFi Registry 0.1.0" and in the "Versioning a Dataflow" section of the NiFi User Guide. However, here are some related UI changes to highlight. Connecting a Registry Client
The NiFi Settings window (accessible from Controller Settings in the top-right Global menu) now has a "Registry Clients" tab where you can connect NiFi to a NiFi Registry: Importing a Flow
If your NiFi instance is connected to an active Registry, when adding a process group to the canvas there is also an option to "Import" a versioned flow:
Selecting "Import" prompts the user to choose a version of a flow to add to the canvas: Version States
There are new icons that show: the version state of an individual process group the count of the statuses of versioned process groups within a process group the count of the statuses of versioned process groups in the root process group
Here are the meanings of each icon/state:
Up to date
Locally modified
Stale
Locally modified and stale
Sync failure Version state information is also shown in the "Process Groups" tab of the Summary Page:
As mentioned previously, more information regarding NiFi and NiFi Registry integration can be found in the "Versioning a Dataflow" section of the NiFi User Guide.
... View more
Labels:
01-19-2018
08:13 PM
4 Kudos
Objective
This tutorial walks you through how to install and setup a local Apache NiFi Registry to integrate with Apache NiFi and start using versioned NiFi dataflows. It assumes basic experience with NiFi but little to no experience with NiFi Registry. A video version of this tutorial can be seen here: https://youtu.be/X_qhRVChjZY Environment
This tutorial was tested using the following environment and components:
Mac OS X 10.11.6
Apache NiFi 1.5.0
Apache NiFi Registry 0.1.0
Note: Apache NiFi 1.5.0 is the first NiFi release to support integration with the NiFi Registry. Nifi Registry 0.1.0 is the first and currently only version of the application. Apache NiFi Registry Configuration Registry Installation
Download the tarball of the 0.1.0 Registry release:
nifi-registry-0.1.0-bin.tar.gz
Extract the tar:
tar xzvf nifi-registry-0.1.0-bin.tar.gz
Start the Registry In a terminal window, navigate to the directory where NiFi Registry was installed. Run:
bin/nifi-registry.sh start
Open Registry UI
Navigate to the registry UI in your browser:
http://localhost:18080/nifi-registry
Note:By default the registry is unsecured. The port can be changed by editing the nifi-registry.properties file in the NiFi Registry conf directory (the exact property to change is nifi.registry.web.http.port), but the default port is 18080. Bucket Creation
A Bucket is a container that stores and organizes flows in the Registry. The Registry is empty as there are no buckets/flows yet.
To create a bucket, select the Settings icon ( ) in the top right corner of the screen. In the Buckets window that appears, select the "New Bucket" button.
Enter the bucket name "Test" and select the "Create" button.
The "Test" bucket is created:
There are no permissions configured by default, so anyone is able to view, create and modify buckets in this instance. For information on securing the Registry, see the NiFi Registry System Administrator’s Guide. Apache NiFi Configuration Connect NiFi to the Registry
With the Registry is running, we can tell NiFi about it.
In NiFi, select "Controller Settings" from the top-right Global menu:
Select the Registry Clients tab and the "+" button to add a new Registry Client. Enter a name and the URL of the Registry instance (http://localhost:18080): Versioned DataFlows Start Version Control on a Process Group
NiFi can now place a process group under version control which saves it as a flow resource in the Registry.
Right-click on a process group and select "Version→Start version control" from the context menu:
The local registry instance and "Test" bucket are chosen by default to store your flow since they are the only registry connected and bucket available. Enter a flow name, flow description, comments and select "Save":
As indicated by the Version State icon ( ) in the top left corner of the component, the process group is now saved as a versioned flow in the registry.
Go back to the Registry UI and return to the main page to see the versioned flow you just saved (a refresh may be required): Save Changes to a Versioned Flow
Changes made to the versioned process group can be reviewed, reverted or saved.
For example, if changes are made to the ABCD flow, the Version State changes to "Locally modified" ( ). The right-click menu will now show the options "Commit local changes", "Show local changes" or "Revert local changes":
Select "Show local changes" to see the details of the changes made:
Return to the context menu and select "Commit local changes". Enter comments and select "Save" to save the changes:
Version 2 of the flow is saved:
Note: Some actions made to the versioned process group are not considered local changes. More information can be found in the
Managing Local Changes section of the NiFi User Guide. Import a Versioned Flow
With a flow existing in the Registry, we can use it to illustrate how to import a versioned process group.
In NiFi, select Process Group from the Components toolbar and drag it onto the canvas:
Instead of entering a name, click the Import link:
Choose the version of the flow you want imported and select "Import":
A second identical PG is now added: Help
To learn more about NiFi Registry functionality and working with versioned flows in NiFi, see the following links:
Apache NiFi Registry User Guide
Apache NiFi Registry System Administrator's Guide
Versioning a DataFlow (Apache NiFi User Guide)
Apache NiFi - How do I deploy my flow?
... View more
Labels:
11-07-2017
08:03 PM
4 Kudos
Objective
This is the second of a two article series on the ValidateRecord processor. The first walks you through a NiFI flow that converts a CVS file into JSON format and validates the data against a given schema.
This article discusses the effects of enabling/disabling the "Strict Type Checking" property of the ValidateRecord processor.
Note: The ValidateRecord processor was introduced in NiFi 1.4.0. Environment
This tutorial was tested using the following environment and components:
Mac OS X 10.11.6 Apache NiFi 1.4.0 Strict Type Checking Property
A useful property of the ValidateRecord processor is "Strict Type Checking". If the incoming data has a Record where a field is not of the correct type, this property determines how to handle the Record. If set to "true", the Record will be considered invalid. If set to "false", the Record will be considered valid.
To demonstrate both cases, we need to ingest data that can distinguish between different types (which our CSV data from the first article could not). Let's grab a snippet of the JSON candy data and make some changes. Specifically let's put a string value for the "chocolate" field (which is of type int) and let's put a decimal value for the "competitorname" field (which is of type string😞
[ {
"competitorname" : "One dime",
"chocolate" : "0",
"fruity" : 0,
"caramel" : 0,
"peanutyalmondy" : 0,
"nougat" : 0,
"crispedricewafer" : 0,
"hard" : 0,
"bar" : 0,
"pluribus" : 0,
"sugarpercent" : 0.011,
"pricepercent" : 0.116,
"winpercent" : 32.261086
}, {
"competitorname" : 3.14159,
"chocolate" : 1,
"fruity" : 0,
"caramel" : 0,
"peanutyalmondy" : 0,
"nougat" : 0,
"crispedricewafer" : 1,
"hard" : 0,
"bar" : 0,
"pluribus" : 1,
"sugarpercent" : 0.87199998,
"pricepercent" : 0.84799999,
"winpercent" : 49.524113
} ]
Here is the JSON file: type-checking.txt (Change the extension from .txt to .json after downloading)
Place the type-checking.json file in your input directory: In order to process the JSON file, the ValidateRecord processor needs to use a JSON Record Reader. Go to the configuration window for the processor and select "Create new service..." for the Record Reader:
Select JSONTreeReader, then "Create": and then select the Arrow icon next to the reader:
Save the changes made before going to the Controller Service.
Go to the configuration window of the JsonTreeReader controller service, select "AvroSchemaRegistry" for the Schema Registy and then select Apply: Enable the JsonTreeReader service. The flow is ready to run.
Start the GetFile, UpdateAtttribute and ValidateRecord processors. With "Strict Type Checking" set to "true", the 2 records are considered invalid and are routed to that connection:
Start the LogAttribute processor to clear the queue. Stop all processors. Place the type-checking.json file in your input directory again.
Now let's change the Strict Type Checking property to "false":
Running the flow this time, the 2 records are considered valid and are routed to that connection:
Note: The documentation for the Strict Type Checking property states that when set to false, the relevant record fields will be coerced into the correct type. This functionality is currently broken (see NIFI-4579).
... View more
Labels:
10-23-2017
07:01 PM
2 Kudos
Objective
This tutorial demonstrates how to use the
QueryDatabaseTable and PutKudu processors to read data from a MySQL database and put into Kudu. Thanks to @Cam Mach for his assistance with this article.
Note: The PutKudu processor was introduced in NiFi 1.4.0. Environment
This tutorial was tested using the following environment and components:
Mac OS X 10.11.6
Apache NiFi 1.4.0
Apache Kudu 1.5.0
MySQL 5.7.13 PutKudu (AvroReader) Demo Configuration MySQL Setup
In your MySQL instance, choose a database ("nifi_db" in my instance) and create the table "users":
unix> mysql -u root -p
unix> Enter password:<enter>
mysql> use nifi_db;
mysql>CREATE TABLE `users` (
`id` mediumint(9) NOT NULL AUTO_INCREMENT,
`title` text,
`first_name` text,
`last_name` text,
`street` text,
`city` text,
`state` text,
`zip` text,
`gender` text,
`email` text,
`username` text,
`password` text,
`phone` text,
`cell` text,
`ssn` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=103 DEFAULT CHARSET=latin1;
Add data to the "users" table:
mysql>INSERT INTO `users` (`id`, `title`, `first_name`, `last_name`, `street`, `city`, `state`, `zip`, `gender`, `email`, `username`, `password`, `phone`, `cell`, `ssn`)
VALUES (1, 'miss', 'marlene', 'shaw', '3450 w belt line rd', 'abilene', 'florida', '31995', 'F', 'marlene.shaw75@example.com', 'goldenpanda70', 'naughty', '(176)-908-6931', '(711)-565-2194', '800-71-1872'),
(2, 'ms', 'letitia', 'jordan', '2974 mockingbird hill', 'irvine', 'new jersey', '64361', 'F', 'letitia.jordan64@example.com', 'lazytiger614', 'aaaaa1', '(860)-602-3314', '(724)-685-3472', '548-93-7031'),
(3, 'mr', 'todd', 'graham', '5760 spring hill rd', 'garden grove', 'north carolina', '81790', 'M', 'todd.graham39@example.com', 'purplekoala484', 'paintball', '(230)-874-6532', '(186)-529-4912', '362-31-5248'),
(4, 'mr', 'seth', 'martinez', '4377 fincher rd', 'chandler', 'south carolina', '73651', 'M', 'seth.martinez82@example.com', 'bigbutterfly149', 'navy', '(122)-782-5822', '(720)-778-8541', '200-80-9087'),
(5, 'mr', 'guy', 'mckinney', '4524 hogan st', 'iowa park', 'ohio', '24140', 'M', 'guy.mckinney53@example.com', 'blueduck623', 'office', '(309)-556-7859', '(856)-764-9146', '973-37-9077'),
(6, 'ms', 'anna', 'smith', '5047 cackson st', 'rancho cucamonga', 'pennsylvania', '56486', 'F', 'anna.smith74@example.com', 'goldenfish121', 'albion', '(335)-388-7351', '(485)-150-6348', '680-20-6440'),
(7, 'mr', 'johnny', 'johnson', '7250 bruce st', 'gresham', 'new mexico', '83973', 'M', 'johnny.johnson73@example.com', 'crazyduck127', 'toast', '(142)-971-3099', '(991)-131-1582', '683-26-4133'),
(8, 'mrs', 'robin', 'white', '7882 northaven rd', 'orlando', 'connecticut', '40452', 'F', 'robin.white46@example.com', 'whitetiger371', 'elizabeth', '(311)-659-3812', '(689)-468-6420', '960-70-3399'),
(9, 'miss', 'allison', 'williams', '7648 edwards rd', 'edison', 'louisiana', '52040', 'F', 'allison.williams82@example.com', 'beautifulfish354', 'sanfran', '(328)-592-3520', '(550)-172-4018', '164-78-8160');
Kudu Setup
For my setup, I followed the
Apache Kudu Quickstart instructions to easily set up and run a Kudu VM.
To check that your VM is running:
unix> VBoxManage list runningvms
"kudu-demo" {b39279b5-3dd6-478a-ac9d-2204bf88e7b9}
To see what IP Kudu is running on:
unix> VBoxManage guestproperty get kudu-demo /VirtualBox/GuestInfo/Net/0/V4/IP
Value: 192.168.58.100
The Kudu web client runs on port 8051:
Create a table in Kudu by first connecting to Impala in the virtual machine:
unix> ssh demo@quickstart.cloudera -t impala-shell
demo@quickstart.cloudera's password:
[quickstart.cloudera:21000] >
(
Note: The username and password for the Quickstart VM is "demo".)
Create the Kudu table with the same columns and data types as the MySQL table:
[quickstart.cloudera:21000] > CREATE TABLE users_kudu
(
id BIGINT,
title STRING,
first_name STRING,
last_name STRING,
street STRING,
city STRING,
state STRING,
zip STRING,
gender STRING,
email STRING,
username STRING,
password STRING,
cell STRING,
ssn STRING,
PRIMARY KEY(id)
)
PARTITION BY HASH PARTITIONS 16
STORED AS KUDU;
NiFi Flow Setup
Follow the following detailed instructions to set up the flow. Alternatively, a template of the flow can be downloaded here: putkudu-querydatabasetable.xml
1. Start NiFi. Two controller services are needed for the flow. Click the "Configuration" button (gear icon) from the Operate palette:
This opens the NiFi Flow Configuration window. Select the "Controller Services" tab. Click the "+" button and add a DBCPConnectionPool controller service:
Configure the controller service as follows (adjusting the property values to match your own MySQL instance and environment):
Next, add an AvroReader controller service:
Apply the default configuration:
Select the "lightning bolt" icon for each controller service to enable them:
2. Return to the NiFi canvas. Add a QueryDatabaseTable processor:
Configure the processor as follows:
where:
The DBCPConnectionPool controller service created earlier is selected for Database Connection Pooling Service
"users" is entered for the Table Name
"id" is entered for the Maximum-value Columns
3. Add a PutKudu processor and connect the two processors:
Configure the PuKudu processor as follows:
where:
"192.168.58.100:7051" is entered for the Kudu Masters IP and port (7051 is the default port)
"impala::default.users_kudu" is entered for the Table Name
Skip head line property is set to "false"
The AvroReader controller service created earlier is selected for Record Reader
Auto-terminate the Success relationship:
On the canvas, make a "failure" relationship connection from the PutKudu processor to itself:
4. The flow is ready to run. Run Flow
Start the QueryDatabaseTable processor.
Looking at the contents of the FlowFile in the queue, the data from the MySQL table has been ingested and converted to Avro format:
Start the PutKudu processor to put the data into Kudu:
This can be confirmed via a Select query:
With the flow still running, add another row of data to the Mysql "users" table:
The flow processes this data and the new row appears in Kudu:
Helpful Links
Here are some links to check out if you are interested in other flows which utilize the record-oriented processors and controller services in NiFi:
Convert CSV to JSON, Avro, XML using ConvertRecord
Installing a local Hortonworks Registry to use with Apache NiFi
Running SQL on FlowFiles using QueryRecord Processor
Using PublishKafkaRecord_0_10 (CSVReader/JSONWriter) in Apache NiFi 1.2+
Using PutElasticsearchHttpRecord (CSVReader)
Using PartitionRecord (GrokReader/JSONWriter) to Parse and Group Log Files
Geo Enrich NiFi Provenance Event Data using LookupRecord
Using PutMongoRecord to put CSV into MongoDB
... View more
Labels: