Created 11-07-2017 01:32 PM
Hi All,
Thanks to this awesome community.
I am trying to understand how does a querydatabasetable works behind the scenes. I have not scheduled the porcessor to run. So does it keeps querying to database? keeps polling? or runs on some default schedule if we do not provide one. From the documentation it says processor is intended to run on primary node only, any specific reason?
Any suggestions to clarify my doubt here.
Thanks
Dheeru
Created 11-07-2017 05:07 PM
QueryDatabaseTable query the database at the defined schedule. Even if you don't custom the scheduling of the processor, there's one by default in the scheduling tab.
The processor is intended to be used on Primary only to avoid ingesting data several times. This processor doesn't accept an incoming connection so you can not customize it dynamically with previous flow parts. So if you deploy it at all nodes, each node will ingest the exact same data (data duplication)
Created 11-07-2017 01:34 PM
Created 11-07-2017 05:07 PM
QueryDatabaseTable query the database at the defined schedule. Even if you don't custom the scheduling of the processor, there's one by default in the scheduling tab.
The processor is intended to be used on Primary only to avoid ingesting data several times. This processor doesn't accept an incoming connection so you can not customize it dynamically with previous flow parts. So if you deploy it at all nodes, each node will ingest the exact same data (data duplication)
Created 11-07-2017 06:15 PM
@Abdelkrim Hadjidj Thanks a lot solidifies my understanding, appreciate it
Created 11-07-2017 06:36 PM
These comments are spot-on, thanks! Also I'd mention if you want to dynamically customize it with incoming flow files, an alternative is to send your flow into GenerateTableFetch (on the primary node only, so your most upstream processor(s) will need to run on the primary node only). GenerateTableFetch (GTF) is like QueryDatabaseTable (QDT), with the big differences being 1) GTF takes incoming flow files, and 2) QDT executes the SQL it generates internally, where GTF sends the SQL out as flow files so some other processor (ExecuteSQL, e.g.) can execute it. This can be used by sending the SQL output from GTF to a Remote Process Group (RPG) pointed at an Input Port on your same cluster. This RPG -> Input Port pattern is used to distribute the flow files among the nodes in the cluster, rather than every node working on the same data (which leads to data duplication as @Abdelkrim Hadjidj mentions above. Downstream from the Input Port, all nodes are processing their subset of the flow files in parallel, so you can send Input Port -> ExecuteSQL. This flow is basically a parallel, distributed version of what QueryDatabaseTable does on one node.
Created 01-09-2018 09:41 AM
I want to parse CSV file in Nifi which contains json fields. like Name,age,address so the address field is containing json field ({'city':'delhi','state':'delhi','zipcode':'100398'}) so i want to parse this file CSV file into again in CSV format so that my fields looks like below Name,Age,Address.City,Address.State,Address.ZipCode and data goes inside the same columns.
Can we do like this, pls help me for the same Using NiFi.
Thanks !
Created 01-09-2018 08:48 PM
Hi @Surendra Shringi Please create a new question for this
Created 01-09-2018 09:40 AM
I want to parse CSV file in Nifi which contains json fields. like Name,age,address so the address field is containing json field ({'city':'delhi','state':'delhi','zipcode':'100398'}) so i want to parse this file CSV file into again in CSV format so that my fields looks like below Name,Age,Address.City,Address.State,Address.ZipCode and data goes inside the same columns.
Can we do like this, pls help me for the same Using NiFi.
Thanks !
Created 01-09-2018 10:21 AM
You can easily access the document from here and maintain your record.click here save documents windows 10