About bbende

bbende · ‎08-31-2016

If you created a NAR project you should have a project structure similar to what is shown here: https://cwiki.apache.org/confluence/display/NIFI/Maven+Projects+for+Extensions#MavenProjectsforExtensions-ExampleProcessorBundleStructure Your processors module would have Maven dependencies on all of the modules you need to call from your processor code and when you build your NAR they would all get packaged into your NAR. If you need config files on the classpath, you would likely put them in src/main/resources of your processors module. If the config files need to be editable, you could have a property in your processor that specifies a directory of where to load config from and then have code in your processor to load them in, In the end you would generally put only the NAR into NiFi's lib directory.

bbende · ‎08-30-2016

You should be able to have the NiFi on your local machine pull from the NiFi on the remote machine... Remote machine would have ListHDFS -> FetchHDFS -> Output Port Local machine would have Remote Process Group pointing to remote NiFI, and then the connection from the output port to whatever you want to do locally. The remote NiFi will need site-to-site enabled by setting nifi.remote.input.socket.port and that port will also need to be open through the firewall.

bbende · ‎08-30-2016

Can you give some background on why you want to set the classpath? NiFi has a very specific class-loader isolation model and generally you don't add JARs to NiFi's classpath like you do with other projects.

bbende · ‎08-29-2016

In 1.0.0 the difference will be that the cache server has to run on all nodes because there is no more concept of choosing where the controller service runs (since there is no master). The cache client would be configured to point to the cache server on one of the nodes, so if that node goes down there is still no automatic failover at this point.

bbende · ‎08-29-2016

1) The Remote Process Group checks periodically (I think once per minute) with the NCM to get the status of the nodes in the cluster. If NCM is down and then one of the other nodes fails, the primary node will try to send data to that failed node, but it will get some kind of exception and then it will move on and try another node. So primary node doesn't know the other node is dead, but keeps trying to nodes until on succeeds. 2) Yes if you run the cache server on the NCM and NCM fails, then the other nodes can't access the cache server. The long term solution is to use a true distributed cache (memcached, redis, etc) as the backing implementation that the cache client talks to, this just hasn't been implemented yet.

bbende · ‎08-25-2016

You could have RouteOnAttribute processor right before MergeContent, and add a property like foo = ${header:equals("foo")} and then everything with a header of "foo" will be routed to a relationship called "foo", everything else will get dropped.

bbende · ‎08-25-2016

Both of those flows should have the data distributed in the cluster... Flow #1 is a pull model where all the nodes in the cluster will pull data from the standalone. Flow #2 is a push model where the standalone will push the data to all the nodes in the cluster. Either approach is correct, but I tend to lean towards the push model (#2) since it lets the source of the data (the standalone instance) decide where to send the data.

bbende · ‎08-24-2016

Whenever you do site-to-site, it will automatically do load-balancing for you... So if you have your standalone doing ListFile -> FetchFile -> OutputPort then on your cluster, all you need to do is have a RPG pointing back to the Output Port on the standalone instance, there will be an instance of this RPG on each node of your cluster and each one will pull from the Output port. No need to do another internal RPG. See the site-to-site section here: https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html You can't really distribute the fetching of the source file if it is coming from the local filesystem. The only thing that can fetch that file is something with access to the local filesystem, which is your standalone NiFI. If it was a shared filesystem then you could.

bbende · ‎08-23-2016

Glad to hear it!

bbende · ‎08-23-2016

You probably have a couple of options... I don't think you want the same NiFi instance that is running your main dataflow, also using PutSplunk to monitor itself. If you had TailFile -> PutSplunk, where TailFile was tailing the same instance, it would potentially create a cycle where the more your tailed and sent to splunk, the more logs produced, the more you tailed, the more logs you produced, etc. I would suggest a second NiFi instance (maybe even the MiNiFi Java agent) to monitor the logs of the main instance. Another possibly simpler solution... configure the NiFi logback.xml to add a UDP/TCP appender that can send logs to Splunk. This way anything NiFi logs to nifi-app.log will get forwarded to Splunk. Last option, slightly different than logging, NiFi has a concept called a ReportingTask that can be used send metrics and statistics to other systems. If that was the information you were interested in, you could implement a custom ReportingTask to send data to Splunk.

Online	Offline
Last Visited	‎09-10-2020 01:23 PM

Member Since	‎09-29-2015 04:02 PM
Last Visited	‎09-10-2020 01:23 PM
Posts	871
Kudos received	709

Cloudera Community

Re: Using nifi registry in a nifi cluster.

Re: Is there a way to enable a stateful status upd...

Re: Automated Start/Stop of a NiFi Processor

Re: PublishKafkaRecord_0_10 1.2.0.3.0.1.1-5 Error:...

Re: how to configure mergecontent processor

Re: Is there any command to set classpath in Apach...

Re: How can we fetch files from a HDFS to local ma...

Re: Is there any command to set classpath in Apach...

Re: If NiFi NCM fails, how about Loadbalancing, Ma...

Re: If NiFi NCM fails, how about Loadbalancing, Ma...

Re: How to Merge files together by file attribute ...

Re: Load balancing while the fetching of file fro...

Re: Load balancing while the fetching of file fro...

Re: Error accessing NIFI remote port thru SSH tunn...

Re: Nifi 0.7 putsplunk processor to send log files...