Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)
Guru

Objective

With the release of NiFi Registry 0.2.0, flow contents can now be stored under a Git directory using the new GitFlowPersistenceProvider.

This tutorial walks you through how to configure this provider in NiFi Registry so that versioned flows in NiFi are automatically saved to a Git repository.

A video version of this tutorial can be seen here: https://youtu.be/kK7eVppg9Aw

Environment

This tutorial was tested using the following environment and components:

  • Mac OS X 10.11.6
  • Apache NiFi Registry 0.2.0
  • Apache NiFi 1.7.1

GitFlowPersistenceProvider

Git Configuration

First, create a new GitHub repo:

85586-1-creategitrepo.png

then clone it locally using the git clone command (e.g. git clone https://github.com/andrewmlim/versioned_flows.git):

85587-2-gitclonerepo.png

Next, go to GitHub’s “Developer settings” and create a new “Personal access token”:

85588-3-createpersonalaccesstoken.png

85589-4-remoteaccesspassword.png

NiFi Registry Configuration

In the ./conf/providers.xml file, configure the following properties:

  • Set org.apache.nifi.registry.provider.flow.git.GitFlowPersistenceProvider as the qualified class name
  • Set “Flow Storage Directory” to the directory where the repo was cloned
  • Set "Remote to Push" to origin
  • Set “Remote Access User” to your GitHub username
  • Set “Remote Access Password” to the personal access token

Here is an example of these changes in providers.xml:

<flowPersistenceProvider>
 <class>org.apache.nifi.registry.provider.flow.git.GitFlowPersistenceProvider</class>
 <property name="Flow Storage Directory">./versioned_flows</property>
 <property name="Remote To Push">origin</property>
 <property name="Remote Access User">andrewmlim</property>
 <property name="Remote Access Password">f1295e16f933d4468d948ea276372da8b0585bda</property>
</flowPersistenceProvider>

Note: The “Remote To Push” property specifies the name of the remote to automatically push to. This property is optional and if not specified, commits will remain in the local repository unless a push is performed manually.

Saving a Versioned Flow to the Git Repo

Start up NiFi Registry and create a bucket:

85590-5-registrybucket1.png

Start up a NiFi instance and connect to the Registry:

85591-6-nifiregistryclient.png

Create a process group. Start version control:

85592-7-startversioncontrol.png

Save the flow:

85593-8-saveflow.png

In GitHub, you will see that the Bucket and Flow have been saved in your repo:

85594-9-gitrepocontents.png

As shown, Buckets are represented as directories and Flow contents are stored as files in the Bucket directory they belong to. Flow snapshot histories are managed as Git commits, meaning only the latest version of Buckets and Flows exist in the Git directory.

85595-10-gitcommit.png

Note: The commit message states "By NiFi Registry user: anonymous" since the environment was unsecured and there was no user logged into NiFi. The commit message would have the user's identity if secure.

Helpful Links

Here are some helpful links that were used as references for this article:

3,986 Views
Comments
New Contributor

Hi @Andrew Lim, great article! Thanks a lot. I have a question. Can I do the same with BitBucket? If yes, what will be the class for bitbucket? Thanks.

Not applicable

@Marshal Tito, I think you can. I just did on GitLab. Those access tokens are used by all those hosted git providers as limited scope passwords. So it is a matter of username/password in the end.

BUT important:
> then clone it locally using the git clone command (e.g. git clone https://github.com/andrewmlim/versioned_flows.git):

It is mandatory to clone the https url (not the git one as it will try to use your ssh keys in that case)

New Contributor

Hi,

let me advertise an option for git clone using git+ssh. I have put together a Docker image which supports git cloning using git+ssh out-of-the box.

Give it a try and let me know how it works. Usage is as follows:

 docker run --name nifi-registry \
   -p 18080:18080 \
   -v ~/.ssh:/home/nifi/.ssh \
   -e 'FLOW_PROVIDER=git' \
   -e 'GIT_REMOTE_URL=git@github.com:michalklempa/docker-nifi-registry-example-flow.git' \
   -e 'GIT_CHECKOUT_BRANCH=example' \
   -e 'FLOW_PROVIDER_GIT_FLOW_STORAGE_DIRECTORY=/opt/nifi-registry/flow-storage-git' \
   -e 'FLOW_PROVIDER_GIT_REMOTE_TO_PUSH=origin' \
   -e 'GIT_CONFIG_USER_NAME=Michal Klempa' \
   -e 'GIT_CONFIG_USER_EMAIL=michal.klempa@gmail.com' \
   -d \
   michalklempa/nifi-registry:latest

or using ~/.ssh as a bind mount point:

 docker run --name nifi-registry \
   -p 18080:18080 \
   -e 'FLOW_PROVIDER=git' \
   -e 'GIT_REMOTE_URL=git@github.com:michalklempa/docker-nifi-registry-example-flow.git' \
   -e 'GIT_CHECKOUT_BRANCH=example' \
   -e 'FLOW_PROVIDER_GIT_FLOW_STORAGE_DIRECTORY=/opt/nifi-registry/flow-storage-git' \
   -e 'FLOW_PROVIDER_GIT_REMOTE_TO_PUSH=origin' \
   -e 'GIT_CONFIG_USER_NAME=Michal Klempa' \
   -e 'GIT_CONFIG_USER_EMAIL=michal.klempa@gmail.com' \
   -e 'SSH_PRIVATE_KEY='$(base64 -w 0 < ~/.ssh/id_rsa) \
   -e 'SSH_KNOWN_HOSTS='$(base64 -w 0 < ~/.ssh/known_hosts) \
   -e 'SSH_PRIVATE_KEY_PASSPHRASE=' \
   -d \
   michalklempa/nifi-registry:latest

Using HTTPS is also supported:

docker run --name nifi-registry \
   -p 18080:18080 \
   -e 'FLOW_PROVIDER=git' \
   -e 'GIT_REMOTE_URL=https://github.com/michalklempa/docker-nifi-registry-example-flow.git' \
   -e 'GIT_CHECKOUT_BRANCH=example' \
   -e 'FLOW_PROVIDER_GIT_FLOW_STORAGE_DIRECTORY=/opt/nifi-registry/flow-storage-git' \
   -e 'FLOW_PROVIDER_GIT_REMOTE_TO_PUSH=origin' \
   -e 'FLOW_PROVIDER_GIT_REMOTE_ACCESS_USER=michalklempa' \
   -e 'FLOW_PROVIDER_GIT_REMOTE_ACCESS_PASSWORD=thisisnotmypassword:)' \
   -e 'GIT_CONFIG_USER_NAME=Michal Klempa' \
   -e 'GIT_CONFIG_USER_EMAIL=michalklempa@gmail.com' \
   -d \
   michalklempa/nifi-registry:latest


Full documentation available at github:

https://github.com/michalklempa/docker-nifi-registry/#git-cloning-the-repository-at-startup

and docker image on dockerhub:

https://hub.docker.com/r/michalklempa/nifi-registry



New Contributor

Hi, Let's pretend that I'm being forced against my will to run Nifi on Windows. The registry isn't supported on Windows and I can't run a VM. Can I track certain Nifi files in a regular old Git repository and expect to be able to switch between branches with a simple checkout? If so which files do I need to track? And how do I refresh the Nifi work space to reflect those changes?

Rising Star

Hi @Andrew Lim thank you for this detailed and step-by-step information, exactly what I was looking for.

But I do have some questions which perhaps are a little weird caused by having no experience with github.

1. If I get this right the connection to github is "only" to store the content of each versioned flow outside the server where the registry is installed? So it is some kind of additional safty. Is this right?

2. Following your description one has to decide either using nifi-registry or github? Or did I get this wrong?

3. What if I try to connect to github and then want (only) to turn back to nifi-registry? Will this cause any predictable problems?

4. Is there any restriction in a clustered environment?

I would be greatful if you will answer this questions, thanks a lot!

Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 06:44 AM
Updated by:
 
Contributors
Top Kudoed Authors