Member since
04-29-2016
192
Posts
20
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1678 | 07-14-2017 05:01 PM | |
2845 | 06-28-2017 05:20 PM |
02-17-2017
02:48 PM
Hi All, How can an external system check on the health of the NiFi nodes in a cluster. For example, a load balancer that's pushing data to NiFi cluster nodes needs to know which nodes have NiFi up and running; what property/state/flag in the NiFi node can be leveraged to verify that the NiFi instance is running on the Node and can accept data from the load balancer. Thanks
... View more
Labels:
- Labels:
-
Apache NiFi
02-16-2017
09:48 PM
Hi guys, Would appreciate your input on the following NiFi Provenance questions:
What are the other uses of Provenance data (if any) besides metadata, lineage, etc. What’s the best practice regarding what type of NiFi provenance data to keep/store - store all provenance data (keeping possible future use cases in mind) or extract/store only what's needed for our current use case (currently it's metadata and lineage); I'm asking this question as we're trying to build our first iteration of Data Lake and we want to follow the best practices when it comes to NiFi Provenance data that we store Provenance API versus SiteToSiteProvenanceReportingTask - from my understanding these are 2 ways of getting Provenance data, for storage and further processing; is one preferable over the other for extracting metadata and lineage. Thank you.
... View more
Labels:
- Labels:
-
Apache NiFi
02-16-2017
08:51 PM
Hello, I'm looking for info/ideas/examples on how to integrate NiFi with Atlas, to store metadata and lineage information from NiFi in Atlas. I found couple of custom processor examples (https://community.hortonworks.com/repos/39432/nifi-atlas-lineage-reporter.html, https://community.hortonworks.com/repos/66014/nifi-atlas-bridge.html) here in HCC, which may not be Production ready, so I am trying to find solutions that can be implemented in a Production environment. Thanks in advance.
... View more
Labels:
- Labels:
-
Apache NiFi
02-13-2017
01:57 PM
Thank you @Pierre Villard I think I'm getting there... 🙂
... View more
02-10-2017
08:46 PM
Awesome, thanks for clarifying @Bryan Bende
... View more
02-10-2017
07:08 PM
1 Kudo
@Bryan Bende great article, it is very helpful to understand how things work in a Cluster environment; do you have plans to update the article with the new Zero-Master Clustering paradigm; for newbies like me, it would be helpful to know how things (what you described in the article) would change in the new zero-master cluster environment.
... View more
02-09-2017
04:43 PM
ありがとう @kkawamura for clarifying 🙂 I was not aware of site-to-site alternative for ListenX, I'll try to read more about it.
... View more
02-09-2017
04:20 PM
@Pierre Villard thank you for your continued engagement in this conversation. 1) It sounds like by default ExecuteSQL does not provide coordination between nodes for a query, to ensure the query is not duplicated on nodes; and I, as a developer, have to ensure that no duplication occurs (but I don't know what it is that I need to do to make ExecuteSQL work correctly in a cluster); so, an alternative is the combination of GenerateTableFetch and QueryDatabaseTable processors, which work properly by making the GenerateTableFetch run as an "Isolated Processor" on the primary node and QueryDatabaseTable on all nodes; Is that a correct summary of what we discussed thus far ? 2) Also you noted in your last comment : "ExecuteSQL can achieve the same kind of things, it really depends of your use case and how you define your queries." My use case is to query Oracle and SQL Server databases, both of which support parallelization of queries; how do I need to define my queries differently for ExecuteSQL to work in a cluster while querying a relational database that supports query parallelization. 3) Is there any documentation on which NiFi processors work differently in a cluster versus a single node environment and what modifications need to be made for the processors to work correctly in a Cluster. So far my understanding is ListenX , GetX, and ExecuteSQL processors work differently in a Cluster. Again, thank you for your time and patience.
... View more
02-08-2017
06:02 PM
thank @Pierre Villard, I'm sorry, I'm not sure if I understand it clearly; from what you said, it seems ExecuteSQL processor does not need any coordination between nodes as the DB that I'm querying takes care of the coordination (as the DB knows that the queries from the different nodes are in fact from the same job, so it won't send the same data to each node's query eventhough the same query/dataflow runs on each node); so, if ExecuteSQL runs on each node and each node in return is getting different data, then we have the query's load being balanced across all nodes, right ? correct me if that thinking is wrong. what do GenerateTableFetch and QueryDatabaseTable processors do that ExecuteSQL does NOT do in a Cluster environment. I'm not sure if I understand when you would use ExecuteSQL processor versus the other two. Thanks again.
... View more
02-08-2017
05:04 PM
@Pierre Villard I don't mean to beat the dead horse, but does ExecuteSQL processor also have the same issue as ListenX and GetX processors in a Cluster environment (the processor running on each node of the cluster and the need for coordination of which node will read which database records)
... View more