Member since
09-29-2015
871
Posts
723
Kudos Received
255
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3452 | 12-03-2018 02:26 PM | |
2396 | 10-16-2018 01:37 PM | |
3705 | 10-03-2018 06:34 PM | |
2484 | 09-05-2018 07:44 PM | |
1882 | 09-05-2018 07:31 PM |
08-31-2016
11:42 AM
1 Kudo
If you created a NAR project you should have a project structure similar to what is shown here: https://cwiki.apache.org/confluence/display/NIFI/Maven+Projects+for+Extensions#MavenProjectsforExtensions-ExampleProcessorBundleStructure Your processors module would have Maven dependencies on all of the modules you need to call from your processor code and when you build your NAR they would all get packaged into your NAR. If you need config files on the classpath, you would likely put them in src/main/resources of your processors module. If the config files need to be editable, you could have a property in your processor that specifies a directory of where to load config from and then have code in your processor to load them in, In the end you would generally put only the NAR into NiFi's lib directory.
... View more
08-30-2016
08:37 PM
1 Kudo
You should be able to have the NiFi on your local machine pull from the NiFi on the remote machine... Remote machine would have ListHDFS -> FetchHDFS -> Output Port Local machine would have Remote Process Group pointing to remote NiFI, and then the connection from the output port to whatever you want to do locally. The remote NiFi will need site-to-site enabled by setting nifi.remote.input.socket.port and that port will also need to be open through the firewall.
... View more
08-30-2016
03:19 PM
2 Kudos
Can you give some background on why you want to set the classpath? NiFi has a very specific class-loader isolation model and generally you don't add JARs to NiFi's classpath like you do with other projects.
... View more
08-29-2016
05:17 PM
In 1.0.0 the difference will be that the cache server has to run on all nodes because there is no more concept of choosing where the controller service runs (since there is no master). The cache client would be configured to point to the cache server on one of the nodes, so if that node goes down there is still no automatic failover at this point.
... View more
08-29-2016
04:42 PM
2 Kudos
1) The Remote Process Group checks periodically (I think once per minute) with the NCM to get the status of the nodes in the cluster. If NCM is down and then one of the other nodes fails, the primary node will try to send data to that failed node, but it will get some kind of exception and then it will move on and try another node. So primary node doesn't know the other node is dead, but keeps trying to nodes until on succeeds. 2) Yes if you run the cache server on the NCM and NCM fails, then the other nodes can't access the cache server. The long term solution is to use a true distributed cache (memcached, redis, etc) as the backing implementation that the cache client talks to, this just hasn't been implemented yet.
... View more
08-25-2016
06:20 PM
1 Kudo
You could have RouteOnAttribute processor right before MergeContent, and add a property like foo = ${header:equals("foo")} and then everything with a header of "foo" will be routed to a relationship called "foo", everything else will get dropped.
... View more
08-25-2016
12:37 PM
Both of those flows should have the data distributed in the cluster... Flow #1 is a pull model where all the nodes in the cluster will pull data from the standalone. Flow #2 is a push model where the standalone will push the data to all the nodes in the cluster. Either approach is correct, but I tend to lean towards the push model (#2) since it lets the source of the data (the standalone instance) decide where to send the data.
... View more
08-24-2016
01:49 PM
1 Kudo
Whenever you do site-to-site, it will automatically do load-balancing for you... So if you have your standalone doing ListFile -> FetchFile -> OutputPort then on your cluster, all you need to do is have a RPG pointing back to the Output Port on the standalone instance, there will be an instance of this RPG on each node of your cluster and each one will pull from the Output port. No need to do another internal RPG. See the site-to-site section here: https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html You can't really distribute the fetching of the source file if it is coming from the local filesystem. The only thing that can fetch that file is something with access to the local filesystem, which is your standalone NiFI. If it was a shared filesystem then you could.
... View more
08-23-2016
04:41 PM
Glad to hear it!
... View more
08-23-2016
02:49 PM
1 Kudo
You probably have a couple of options... I don't think you want the same NiFi instance that is running your main dataflow, also using PutSplunk to monitor itself. If you had TailFile -> PutSplunk, where TailFile was tailing the same instance, it would potentially create a cycle where the more your tailed and sent to splunk, the more logs produced, the more you tailed, the more logs you produced, etc. I would suggest a second NiFi instance (maybe even the MiNiFi Java agent) to monitor the logs of the main instance. Another possibly simpler solution... configure the NiFi logback.xml to add a UDP/TCP appender that can send logs to Splunk. This way anything NiFi logs to nifi-app.log will get forwarded to Splunk. Last option, slightly different than logging, NiFi has a concept called a ReportingTask that can be used send metrics and statistics to other systems. If that was the information you were interested in, you could implement a custom ReportingTask to send data to Splunk.
... View more