Community Articles

Find and share helpful community-sourced technical articles.
Labels (2)
avatar
Master Guru

Big Data DevOps: Part 2: Schemas, Schemas, Schemas. Know Your Records, Know Your DataTypes, Know Your Fields, Know Your Data.

Since we can do process records in Apache NiFi, Streaming Analytics Manager, Apache Kafka and any tool that can work with a schema, we have a real need to use a Schema Registry. I have mentioned them before. One thing that is important is to be able to automate the management of schemas. Today we will be listing and exporting them for backup and migration purposes. We will also cover how to upload new schemas and version of schemas.

The steps to backup schemas with Apache NiFi 1.5+ is easy.

Backup All Schemas

  1. GetHTTP: Get the List of Schemas for SR via GET
  2. SplitJson to turn list into individual records
  3. EvaluateJsonPath: get the schema name.
  4. InvokeHTTP: get the schema body
  5. EvaluateJsonPath: turn the schema text into a separate flow file
  6. Rename and save both the full JSON record from the registry and the schema only.

NiFi Flow

62812-nififlow.png

Initial Call to List All Schemas

62798-listschemas.png

Get The Schema Name

62799-grabschemaname.png

Example Schema with Text

62800-oneschemagrabbed.png

An Example of JSON Schema Text

62803-schematextdataprovenance.png


Build a New Flow File from The Schema Text JSON

62804-buildschemafile.png

Get the Latest Version of the Schema Text For this Schema By Name

62805-get.png

The List Returned

62809-schemalist.png

Swagger Documentation for SR

62801-createanewschemanobody.png

62802-registryanewschemaversiontext.png

62806-swaggerdocs.png

62807-confluentschemalist.png

62808-confluentswaggertest.png

Example Flow

backup-schema.xml


Schema List JSON Formatting

"entities" : [ { "schemaMetadata" : { "type" : "avro", "schemaGroup" : "Kafka", "name" : "adsb", "description" : "adsb", "compatibility" : "BACKWARD", "validationLevel" : "ALL", "evolve" : true }, "id" : 3, "timestamp" : 1520460239420



Get Schema List REST URL (GET)

http://server:7788/api/v1/schemaregistry/schemas


Get Schema Body REST URL (GET)

http://server:7788/api/v1/schemaregistry/schemas/${schema}/versions/latest?branch=MASTER

See: https://community.hortonworks.com/articles/177301/big-data-devops-apache-nifi-flow-versioning-and-au...


If you wish you can use the Confluent style API against SR and against Confluent Schema Registry. it is slighty different, but easy to change our REST calls to process this.


Swagger Docs

http://YourHWXRegistry:7788/swagger#!/4._Confluent_Schema_Registry_compatible_API/getSubjects

Hortonworks Schema Registry from HDF 3.1
https://community.hortonworks.com/articles/171893/hdf-31-executing-apache-spark-via-executesparkinte...

4,117 Views