Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Ingest data incrementally using Nifi from DB2

Ingest data incrementally using Nifi from DB2

Explorer

Hi,

I have 3 columns , Userid , Serial no and Timestamp column(YYYY-DD-MM HH:MM:SS) in DB2 . I would like to create a incremental load using Nifi . userid,serialno are random numbers not increased primary numbers . TS is increamental.

I have tried to use

FLOW 1

QueryFetchTable->ExecuteSQL->PutHIveStreaming --> logattribute (to avoid duplicates)-- Not working , No Error in the logs (refer attached pic for configuration ) nf.pngnf1.pngnf2.png I'm not able to load even single data to hive using this method as I have used max_column is TS . -- Hive table is Partitioned , Bucketed , ORC and transantional

FLOW 2

ExecuteSQL->PutHiveStreaming->logattribute --> I'm able to manage the data into hive , but I will not be able to incremental load the data -- Hive table is Partitioned , Bucketed , ORC and transantional

Could you please help me to setup a Simple Incremental load using any flow files . Ultimately , I would like a incremental load without duplicates

Hive table is Partitioned , Bucketed , ORC and transnational

I'm open to any processors in Nifi for an incremental load without duplicates . Can you please create a sample workflow with sample config and sample data

@Shu , Any input please . Thanks a lot in advance . Expecting a awesome answer from you :)

11 REPLIES 11
Highlighted

Re: Ingest data incrementally using Nifi from DB2

Super Guru

Flow 1 is the right one, what errors are you seeing?

Highlighted

Re: Ingest data incrementally using Nifi from DB2

Explorer

@Matt Burgess Is that QueryDatabasetable or GeneratetableFetch. Let's say I have 100 records in my source table(DB2) . When I add one record in my source I should be getting 101 records ,instead i get around 201 records . I get duplicated records and keep on adding duplicates for every new record @Shu or @Matt Burgess Kindly assist . Im missing a small thing . Kindly clarify.nf11.png Refer attached for the flow .

Highlighted

Re: Ingest data incrementally using Nifi from DB2

Explorer

There was no error . But I can't see data in hive . Are my configuration for flow 1 is right ? Without Queryfetchtable I'm able to put data in hive . But can't incremental load to Hadoop . I'm using NiFi 1.2 version.

Highlighted

Re: Ingest data incrementally using Nifi from DB2

Explorer

@Matt Burgess Im using Maximum-value Columns as timestamp (YYYY-MM-DD hh:mm:ss) . Do you think Nifi Is unable to take the Maximum-value as a timestamp in the format as it is(YYYY-MM-DD hh:mm:ss)

Highlighted

Re: Ingest data incrementally using Nifi from DB2

Explorer

Outputof the ExecuteSQL flow : "TIMESTAMP" : "2018-06-21T19:07:27.000Z" and Hive Datatype is Timestamp and not accepting the format . Is it because Maximum-value Columns aren't working..

Highlighted

Re: Ingest data incrementally using Nifi from DB2

Super Guru

I think it is because for the version of Hive used by PutHiveStreaming, it uses a version of Avro that does not support logical types. Try setting "Use Avro logical types" to false in QueryDatabaseTable, I believe that will cause the timestamp to be a string in the outgoing Avro file, but might be able to be converted in PutHiveStreaming under the hood.

Highlighted

Re: Ingest data incrementally using Nifi from DB2

Explorer

I'm using GenerateFetchtable instead of QueryDatabaseTable . I'm getting error

"error during database query or conversion of records to avro" . 

How to resolve the above error and to Avoid duplicated records in my table.

Highlighted

Re: Ingest data incrementally using Nifi from DB2

Explorer

@Matt Burgess . I still cannot eliminate duplicates . I have changed Avro logical types to false and it allowed me to insert data into Hive with DUPLICATES . Lets assume . I have 100 records If I insert 1 record and I should be getting 101 records instead I get 201 records.Anything I'm missing right now .Kindly assist

Highlighted

Re: Ingest data incrementally using Nifi from DB2

Explorer

nififlow.png , Attached the working flow with configurations . Everything is working except duplicates . Kindly assist

Don't have an account?
Coming from Hortonworks? Activate your account here