Community Articles

Find and share helpful community-sourced technical articles.
avatar

Prerequisites:

* NIFI 1.0+

* SAP HANA - Instructions to setup a Cloud HANA on AWS or Azure: https://community.hortonworks.com/content/kbentry/58427/getting-started-with-sap-hana-and-vora-with-...

SETUP:

HANA (Source Database)

In this setup we will create a table in HANA table. First follow this HCC article "Demo data in SAP Vora Using Eclipse HANA Modelling tools - Part 3"

You will need to download Eclipse Neon - Eclipse IDE for Java Developers to connect to the SAP HANA that we setup in Part 1 . After you setup eclipse we will need to configure Eclipse to install HANA Modelling tools that will allow us to connect to SAP HANA and execute sql scripts to setup demo data that we will use from SAP Vora.

After you establish connection to your HANA system. Run this code:

DROP TABLE "CODEJAMMER"."EMPLOYEE_ADDRESS";
 
CREATE COLUMN TABLE "CODEJAMMER"."EMPLOYEE_ADDRESS" (
ID bigint not null primary key generated by default as IDENTITY,
"STREETNUMBER" INTEGER CS_INT,
   "STREET" NVARCHAR(200),
   "LOCALITY" NVARCHAR(200),
   "STATE" NVARCHAR(200),
   "COUNTRY" NVARCHAR(200)) UNLOAD PRIORITY 5 AUTO MERGE ;
 
insert into "CODEJAMMER"."EMPLOYEE_ADDRESS" (STREETNUMBER,STREET,LOCALITY,STATE,COUNTRY)  values(555,'Madison Ave','New York','NY','America');
insert into "CODEJAMMER"."EMPLOYEE_ADDRESS" (STREETNUMBER,STREET,LOCALITY,STATE,COUNTRY)  values(95,'Morten Street','New York','NY','USA');


SELECT * FROM "CODEJAMMER"."EMPLOYEE_ADDRESS";

12009-screen-shot-2017-02-02-at-114324-am.png

Now lets setup the NIFI workflow. Final result will look like this:

12005-screen-shot-2017-02-02-at-113916-am.png

Nifi Setup :

This is a simple NIFI setup, the queryDatabase table processor is only available as part of default processors from version 0.6 of Nifi.

Drop an instance of the QueryDatabaseTable processor on your canvas. Right click to configure and fill out the Required fields , plus you click on the error to setup the DB connection pool, see settings below:

12010-screen-shot-2017-02-02-at-114404-am.png

12021-screen-shot-2017-02-02-at-114456-am.png

The limitation with this processor as it is not a true CDC and relies on one column. If the data is reloaded into the column with older data the data will not be replicated into HDFS or any other destination. This processor does not rely on Transactional logs or redo logs .

Next configure the putHDFS processor

configure the Hadoop Core-site.xml and hdfs-site.xml and destination HDFS directory

Now lets start all the processors. Validate that you got data by checking provenance in your processor.

12008-screen-shot-2017-02-02-at-114307-am.png

12007-screen-shot-2017-02-02-at-114235-am.png

You can also check what is the max id column state. The last auto increment ID will be displayed . Right click on QueryDatabaseTable processor and select View State:

12022-screen-shot-2017-02-02-at-114955-am.png

Testing CDC

Now insert a new record in HANA and validate that the record is in JSON format in HDFS

insert into "CODEJAMMER"."EMPLOYEE_ADDRESS" (STREETNUMBER,STREET,LOCALITY,STATE,COUNTRY)  values(2395,'Broadway Street','New York','NY','USA');

Looking into HDFS we see the new JSON record:

12023-screen-shot-2017-02-02-at-123429-pm.png

7,061 Views