Created on 08-03-201706:16 PM - edited 08-17-201911:40 AM
This tutorial walks you through how to install and setup a local Hortonworks Registry to interact with Apache NiFi.
This tutorial was tested using the following environment and components:
Mac OS X 10.11.6
Apache NiFi 1.3.0
Hortonworks Registry 0.2.1
Note: The record-oriented processors and controller services used in the demo flow of this tutorial were in introduced in NiFi 1.2.0. As such, the tutorial needs to be done running Version 1.2.0 or later. Currently, Hortonworks Registry 0.2.1 is the version compatible with NiFi 1.3.0.
Login to your MySQL instance and create the schema registry database and necessary users and privileges:
unix> mysql -u root -p
unix> Enter password:<enter>
mysql> create database schema_registry;
mysql> CREATE USER 'registry_user'@'localhost' IDENTIFIED BY 'registry_password';
mysql> GRANT ALL PRIVILEGES ON schema_registry.* TO 'registry_user'@'localhost' WITH GRANT OPTION;
In the conf directory of the Registry, there is an example MySQL yaml file that we can repurpose:
cp conf/registry.yaml.mysql.example conf/registry.yaml
Edit the following section in the yaml file to add appropriate database and user settings:
The flow we are going to use for this tutorial is the same one used in the article
Convert CSV to JSON, Avro, XML using ConvertRecord. However, we are going to modify the flow to use a schema in our local Hortonworks Registry instead of a local Avro Schema Registry.
The CSV file used by the flow can be downloaded here:
Note: Change the extension from .txt to .csv after downloading.
NiFi Flow Configuration
Input and Output
Create two local directories. One input directory and one for the JSON output. Place the "users.csv" file in the input directory.
Start NiFi. Import the provided template and add it to the canvas:
Update Directory Paths in GetFile and PutFile Processors
Change the Input Directory path in the GetFile processor to point to your local input directory:
Change the Directory path in the PutFile processor to point to your local output directory:
Edit and Enable Controller Services
Now all that remains to run the flow is to modify the schema registry that is used by the record reader and writer controller services. The template is configured to use a local AvroSchemaRegistry controller service. We will change it to use the HortonworksSchemaRegistry.
Select the root process group "NiFi Flow" by clicking an empty area of the canvas. Select the gear icon from the Operate Palette:
This opens the NiFi Flow Configuration window. Select the Controller Services tab and click the "+" button to create a new controller service.
Select HortonworksSchemaRegistry from the list and click "Add":
Select the Edit button ("pencil" icon) next to the HortonworksSchemaRegistry controller service. Configure it to point to the local Hortonworks Schema Registry instance by adding
http://localhost:9090/api/v1 as the value for the "Schema Registry URL" property:
Select the Edit button ("pencil" icon) next to the CSVReader controller service. Change the "Schema Registry" property value from AvroSchemaRegistry to now point to HortonworksSchemaRegistry:
Select the Edit button ("pencil" icon) next to the JsonRecordSetWriter controller service. Change the "Schema Registry" property value from AvroSchemaRegistry to now point to HortonworksSchemaRegistry:
Enable HortonworksSchemaRegistry controller service by selecting the lightning bolt icon. This will then allow you to enable the CSVReader and JSONRecordSetWriter controller services. Select the lightning bolt icons for both of these services. All the necessary controller services should be enabled at this point:
Note: The AvroSchemaRegistry controller service is no longer used by the flow and can remain disabled.
Run the Flow
The flow can now be started:
When run successfully, the JSON formatted file is placed in the local directory we specified earlier in the PutFile processor: