About Raj_B

Raj_B · ‎02-17-2017

Hi All, How can an external system check on the health of the NiFi nodes in a cluster. For example, a load balancer that's pushing data to NiFi cluster nodes needs to know which nodes have NiFi up and running; what property/state/flag in the NiFi node can be leveraged to verify that the NiFi instance is running on the Node and can accept data from the load balancer. Thanks

Raj_B · ‎02-16-2017

Hi guys, Would appreciate your input on the following NiFi Provenance questions: What are the other uses of Provenance data (if any) besides metadata, lineage, etc. What’s the best practice regarding what type of NiFi provenance data to keep/store - store all provenance data (keeping possible future use cases in mind) or extract/store only what's needed for our current use case (currently it's metadata and lineage); I'm asking this question as we're trying to build our first iteration of Data Lake and we want to follow the best practices when it comes to NiFi Provenance data that we store Provenance API versus SiteToSiteProvenanceReportingTask - from my understanding these are 2 ways of getting Provenance data, for storage and further processing; is one preferable over the other for extracting metadata and lineage. Thank you.

Raj_B · ‎02-16-2017

Hello, I'm looking for info/ideas/examples on how to integrate NiFi with Atlas, to store metadata and lineage information from NiFi in Atlas. I found couple of custom processor examples (https://community.hortonworks.com/repos/39432/nifi-atlas-lineage-reporter.html, https://community.hortonworks.com/repos/66014/nifi-atlas-bridge.html) here in HCC, which may not be Production ready, so I am trying to find solutions that can be implemented in a Production environment. Thanks in advance.

Raj_B · ‎02-13-2017

Thank you @Pierre Villard I think I'm getting there... 🙂

Raj_B · ‎02-10-2017

Awesome, thanks for clarifying @Bryan Bende

Raj_B · ‎02-10-2017

@Bryan Bende great article, it is very helpful to understand how things work in a Cluster environment; do you have plans to update the article with the new Zero-Master Clustering paradigm; for newbies like me, it would be helpful to know how things (what you described in the article) would change in the new zero-master cluster environment.

Raj_B · ‎02-09-2017

ありがとう @kkawamura for clarifying 🙂 I was not aware of site-to-site alternative for ListenX, I'll try to read more about it.

Raj_B · ‎02-09-2017

@Pierre Villard thank you for your continued engagement in this conversation. 1) It sounds like by default ExecuteSQL does not provide coordination between nodes for a query, to ensure the query is not duplicated on nodes; and I, as a developer, have to ensure that no duplication occurs (but I don't know what it is that I need to do to make ExecuteSQL work correctly in a cluster); so, an alternative is the combination of GenerateTableFetch and QueryDatabaseTable processors, which work properly by making the GenerateTableFetch run as an "Isolated Processor" on the primary node and QueryDatabaseTable on all nodes; Is that a correct summary of what we discussed thus far ? 2) Also you noted in your last comment : "ExecuteSQL can achieve the same kind of things, it really depends of your use case and how you define your queries." My use case is to query Oracle and SQL Server databases, both of which support parallelization of queries; how do I need to define my queries differently for ExecuteSQL to work in a cluster while querying a relational database that supports query parallelization. 3) Is there any documentation on which NiFi processors work differently in a cluster versus a single node environment and what modifications need to be made for the processors to work correctly in a Cluster. So far my understanding is ListenX , GetX, and ExecuteSQL processors work differently in a Cluster. Again, thank you for your time and patience.

Raj_B · ‎02-08-2017

thank @Pierre Villard, I'm sorry, I'm not sure if I understand it clearly; from what you said, it seems ExecuteSQL processor does not need any coordination between nodes as the DB that I'm querying takes care of the coordination (as the DB knows that the queries from the different nodes are in fact from the same job, so it won't send the same data to each node's query eventhough the same query/dataflow runs on each node); so, if ExecuteSQL runs on each node and each node in return is getting different data, then we have the query's load being balanced across all nodes, right ? correct me if that thinking is wrong. what do GenerateTableFetch and QueryDatabaseTable processors do that ExecuteSQL does NOT do in a Cluster environment. I'm not sure if I understand when you would use ExecuteSQL processor versus the other two. Thanks again.

Raj_B · ‎02-08-2017

@Pierre Villard I don't mean to beat the dead horse, but does ExecuteSQL processor also have the same issue as ListenX and GetX processors in a Cluster environment (the processor running on each node of the cluster and the need for coordination of which node will read which database records)

Online	Offline
Last Visited	‎08-19-2020 03:25 PM

Member Since	‎04-29-2016 04:49 PM
Last Visited	‎08-19-2020 03:25 PM
Posts	192
Kudos received	20

Cloudera Community

Re: Does NiFi evaluate processor properties (with ...

Re: NiFi's GetHDFS processor with Cron schedule no...

How to check on NiFi node health from an external ...

Questions on NiFi Provenance data - storing and us...

How to integrate NiFi with Atlas, for metadata, li...

Re: How to configure NiFi processors (that interfa...

Re: How Do I Distribute Data Across an Apache NiFi...

Re: How Do I Distribute Data Across an Apache NiFi...

Re: NiFi Cluster and Load Balancer

Re: How to configure NiFi processors (that interfa...

Re: How to configure NiFi processors (that interfa...

Re: How to configure NiFi processors (that interfa...