Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.
Labels (2)
Guru

Objective

This tutorial walks you through how to install and setup a local Hortonworks Registry to interact with Apache NiFi.

Environment

This tutorial was tested using the following environment and components:

  • Mac OS X 10.11.6
  • MySQL 5.7.13
  • Apache NiFi 1.3.0
  • Hortonworks Registry 0.2.1

Note: The record-oriented processors and controller services used in the demo flow of this tutorial were in introduced in NiFi 1.2.0. As such, the tutorial needs to be done running Version 1.2.0 or later. Currently, Hortonworks Registry 0.2.1 is the version compatible with NiFi 1.3.0.

Environment Configuration

Hortonworks Registry Installation

Download the 0.2.1 Registry release:

hortonworks-registry-0.2.1.tar.gz

Extract the tar:

  tar xzvf hortonworks-registry-0.2.1.tar.gz

MySQL Database Setup

Login to your MySQL instance and create the schema registry database and necessary users and privileges:

unix> mysql -u root -p
unix> Enter password:<enter>
mysql> create database schema_registry;
mysql> CREATE USER 'registry_user'@'localhost' IDENTIFIED BY 'registry_password';
mysql> GRANT ALL PRIVILEGES ON schema_registry.* TO 'registry_user'@'localhost' WITH GRANT OPTION;
mysql> commit;

Configure registry.yaml

In the conf directory of the Registry, there is an example MySQL yaml file that we can repurpose:

cd hortonworks-registry-0.2.1
cp conf/registry.yaml.mysql.example conf/registry.yaml

Edit the following section in the yaml file to add appropriate database and user settings:

storageProviderConfiguration:
 providerClass: "com.hortonworks.registries.storage.impl.jdbc.JdbcStorageManager"
 properties:
   db.type: "mysql"
   queryTimeoutInSecs: 30
   db.properties:
     dataSourceClassName: "org.mariadb.jdbc.MariaDbDataSource"
     dataSource.url: "jdbc:mysql://localhost/schema_registry"
     dataSource.user: "registry_user"
     dataSource.password: "registry_password"

Note: For my environment (with MySQL installed via Homebrew), I did not need to change these default values.

Run Bootstrap Scripts

  ./bootstrap/bootstrap-storage.sh

Start the Registry Server

  ./bin/registry-server-start.sh ./conf/registry.yaml

Open Registry UI

Navigate to the registry UI in your browser:

http://localhost:9090

Schema Creation

Select the "+" button to add a schema to the registry:

25391-1-addschema.png

Configure the schema as follows:

25392-2-usersschema-properties.png

The schema text is:

{
"type": "record",
"name": "UserRecord",
"fields" : [
 {"name": "id", "type": "long"},
 {"name": "title", "type": ["null", "string"]},
 {"name": "first", "type": ["null", "string"]},
 {"name": "last", "type": ["null", "string"]},
 {"name": "street", "type": ["null", "string"]},
 {"name": "city", "type": ["null", "string"]},
 {"name": "state", "type": ["null", "string"]},
 {"name": "zip", "type": ["null", "string"]},
 {"name": "gender", "type": ["null", "string"]},
 {"name": "email", "type": ["null", "string"]},
 {"name": "username", "type": ["null", "string"]},
 {"name": "password", "type": ["null", "string"]},
 {"name": "phone", "type": ["null", "string"]},
 {"name": "cell", "type": ["null", "string"]},
 {"name": "ssn", "type": ["null", "string"]},
 {"name": "date_of_birth", "type": ["null", "string"]},
 {"name": "reg_date", "type": ["null", "string"]},
 {"name": "large", "type": ["null", "string"]},
 {"name": "medium", "type": ["null", "string"]},
 {"name": "thumbnail", "type": ["null", "string"]},
 {"name": "version", "type": ["null", "string"]},
 {"name": "nationality", "type": ["null", "string"]}
  ]
}

Save the schema:

25393-3-usersschema-saved.png

NiFi Configuration

NiFi Template & CSV File

The flow we are going to use for this tutorial is the same one used in the article Convert CSV to JSON, Avro, XML using ConvertRecord. However, we are going to modify the flow to use a schema in our local Hortonworks Registry instead of a local Avro Schema Registry.

The template can be downloaded here: convert-csv-to-json.xml

The CSV file used by the flow can be downloaded here: users.txt

Note: Change the extension from .txt to .csv after downloading.

NiFi Flow Configuration

Input and Output

Create two local directories. One input directory and one for the JSON output. Place the "users.csv" file in the input directory.

25394-4-in-out.png

Import Template

Start NiFi. Import the provided template and add it to the canvas:

25395-5-templateimportedadded.png

Update Directory Paths in GetFile and PutFile Processors

Change the Input Directory path in the GetFile processor to point to your local input directory:

25396-6-getfile-properties.png

Change the Directory path in the PutFile processor to point to your local output directory:

25397-7-putfile-properties.png

Edit and Enable Controller Services

Now all that remains to run the flow is to modify the schema registry that is used by the record reader and writer controller services. The template is configured to use a local AvroSchemaRegistry controller service. We will change it to use the HortonworksSchemaRegistry.

Select the root process group "NiFi Flow" by clicking an empty area of the canvas. Select the gear icon from the Operate Palette:

25398-8-rootpg-configuration.png

This opens the NiFi Flow Configuration window. Select the Controller Services tab and click the "+" button to create a new controller service.

25399-9-flowcontrollerservices-create.png

Select HortonworksSchemaRegistry from the list and click "Add":

25400-10-addhwxschemaregistrycs.png

Select the Edit button ("pencil" icon) next to the HortonworksSchemaRegistry controller service. Configure it to point to the local Hortonworks Schema Registry instance by adding http://localhost:9090/api/v1 as the value for the "Schema Registry URL" property:

25402-11-hwxschemaregistry-properties.png

Select the Edit button ("pencil" icon) next to the CSVReader controller service. Change the "Schema Registry" property value from AvroSchemaRegistry to now point to HortonworksSchemaRegistry:

25403-12-csvreader-hwxschemaregistry.png

Select the Edit button ("pencil" icon) next to the JsonRecordSetWriter controller service. Change the "Schema Registry" property value from AvroSchemaRegistry to now point to HortonworksSchemaRegistry:

25404-13-jsonwriter-hwxschemaregistry.png

Enable HortonworksSchemaRegistry controller service by selecting the lightning bolt icon. This will then allow you to enable the CSVReader and JSONRecordSetWriter controller services. Select the lightning bolt icons for both of these services. All the necessary controller services should be enabled at this point:

25405-14-controllerservices-enabled.png

Note: The AvroSchemaRegistry controller service is no longer used by the flow and can remain disabled.

Run the Flow

The flow can now be started:

25406-15-flowrunning.png

When run successfully, the JSON formatted file is placed in the local directory we specified earlier in the PutFile processor:

25407-16-putfile-json-saved.png

To learn more about the flow with more detailed explanations of the record-oriented processors and controller services in NiFi, see Convert CSV to JSON, Avro, XML using ConvertRecord.

3,197 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 11:40 AM
Updated by:
 
Contributors
Top Kudoed Authors