About srijitachaturve

srijitachaturve · ‎05-30-2019

Hi @Andrew Lim , Thanks for a detailed explaination. Following your article i am trying to convert a csv to json using convertrecord processor and then load the merged json (output of convertrecord) to redshift using copy from a file.my merged json is stored in s3.I am getting error that csv is not in json format, could you please suggest how to load these records all at once to redshift?

srijitachaturve · ‎04-10-2019

This doesn't works for me, i places a flow.xml.gz from dev to prd cluster,cleared all repsotiories of prod but still i see state in processors.Could you please suggest other way to clear state for all processors at one go ? i tried deleting state folder contents under /nifi/conf but that too dint help,it gave me some error.

srijitachaturve · ‎04-08-2019

Hi @mattburgess i am using the same processor for fetching incremental data from relational tables.i have given max rows fetch size as 500 and max value column as a timestamp. Is fetching data in batch can lead to data loss, as i have seen few records of some timestamp are not being fetched when doing incremental run but are fetched when i clear state and run full load? want to understand working of max rows feature. read your comment regarding max fragment setting on this blog https://community.hortonworks.com/questions/178505/querydatabasetable-processor-shutting-down.html , is the same applicable for max row fetch size too?Please suggest

srijitachaturve · ‎04-03-2019

Thanks Matt for your view on this,the ask is to generate a batchid which should be a sequence number, so whenever querydb processor fetches records from source db (sqlserver) a batchid should be added to the flowfile so that all records have a same batchid when loaded to target table,this will help in auditing of records.but here in cluster mode it seems difficult to achieve this using updateattribute processor.i liked your idea of appending node hostname with the sequence but if i could generate atomic values across all nodes it would be much better.

srijitachaturve · ‎04-03-2019

Thanks David, Idea looks good ,I will try this.

srijitachaturve · ‎03-22-2019

@shu,@Mattclarke,@markpayne How do i generate the sequence number to be used as a stored value as you suggested.As per my knowledge there is only one processor in nifi to generate sequence number and that is update attribute which in cluster mode will again produce different values across all nodes.

srijitachaturve · ‎03-20-2019

Hi All, @mattclarke,@mattburgess,@markpayne I want to generate sequence number in my nifi cluster (3 nodes), I was using update attribute processor with store state locally option , but this is not serving my purpose as each node is generating its own value incrementally and this is creating duplicate values while loading data to target table.I would be grateful if i can get alternate solution to achieve this batchid generation in cluster mode. Thanks in advance!!

srijitachaturve · ‎03-20-2019

@Matt Clarke,@matt burgess Exactly the second point is happening, each node is generating its own value incrementing from last value it has stored in its local state. So which processor or method should i use to generate an incremental batchid (batch1,batch2...so on) since update attribute is messing values when running on cluster. or is there any property by which updateattribute processors on all nodes can pickup each others's last state variable?..please suggest

srijitachaturve · ‎03-20-2019

@Shu will this work even if we have some state alraedy stored withing the processor? for eg: i have a timestamp (2019-03-17 02:00:00:0) stored in the state of my processor now i want the processor to start fetching data after 2019-03-20, will this property help in such scenario?

srijitachaturve · ‎03-20-2019

@mattburgess,@markpayne Hi All, I am using stateful variables to generate an incremental batchid value using update attribute processor,this runs in cluster and have set the processor to run on all nodes. But the batchid values generated are not in an incremental fashion,the processor is missing some values sometime or generating a duplicate value.Is it happening due to restart of the cluster and wiping off of stateful variables data? could you please suggest how can i persist stateful variables data? attached update attribute configuration for reference. Please s uggest!!

Online	Offline
Last Visited	‎08-08-2019 05:19 AM

Member Since	‎07-12-2017 06:19 PM
Last Visited	‎08-08-2019 05:19 AM
Posts	53
Kudos received	3

Cloudera Community

Re: Convert CSV to JSON, Avro, XML using ConvertRe...

Re: Nifi clear state

Re: Incremental Fetch in NiFi with QueryDatabaseTa...

Re: how to generate sequence number in nifi cluste...

Re: how to generate sequence number in nifi cluste...

Re: how to generate sequence number in nifi cluste...

how to generate sequence number in nifi cluster mo...

Re: Are values of stateful variables in nifi clust...

Re: How to set "Initial Max Value" for QueryDataba...

Are values of stateful variables in nifi cluster g...