About bbende

bbende · ‎08-16-2016

I don't have much experience with the site-to-site implementation, but seems like it wouldn't be too difficult to support adding the transit.uri as an attribute when receiving flow files over site-to-site (if thats all we are talking about): https://github.com/apache/nifi/blob/e23b2356172e128086585fe2c425523c3628d0e7/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-site-to-site/src/main/java/org/apache/nifi/remote/protocol/AbstractFlowFileServerProtocol.java#L445 Alternatively, maybe minifi-cpp should have the ability to send metadata since NiFi already supports receiving attributes over site-to-site.

bbende · ‎08-16-2016

In general, concurrent tasks is the number of threads calling onTrigger for an instance of a processors. In a cluster, if you set concurrent tasks to 4, then it is 4 threads on each node of your cluster. I am not as familiar with all the ins and outs of the kafka processors, but for GetKafka it does something like this: int concurrentTaskToUse = context.getMaxConcurrentTasks(); final Map<String, Integer> topicCountMap = new HashMap<>(1); topicCountMap.put(topic, concurrentTaskToUse); final Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCountMap); The consumer is from the Kafka 0.8 client, so it is creating a message-stream for each concurrent task. Then when the processor is triggered it takes one of those message-streams and consumes a message, and since multiple concurrent tasks are trigger the processor, it is consuming from each of those streams in parallel. As far as rebalancing, I think the Kafka client does that transparently to NiFi, but I am not totally sure. Messages that have already been pulled into a NiFi node will stay there until the node is back up and processing.

bbende · ‎08-16-2016

I think it depends what you mean by "schedule it to run every hour"... NiFi itself would always be running and different processors can be scheduled to run according to their needs. Every processor supports timer based scheduling or cron based scheduling, so using either of those you can set a source processor to run every hour. You could also use the REST API to start and stop processors as needed, anything you can do in the UI can be done through the REST API. For best practices for upgrading NiFi see this wiki page: https://cwiki.apache.org/confluence/display/NIFI/Upgrading+NiFi Deploying changes to production, there are a couple of approaches, one of them is based around templates: https://github.com/aperepel/nifi-api-deploy Some people also just move the flow.xml.gz from one environment to another, but this assumes you have parametized everything that is different between environments.

bbende · ‎08-15-2016

In that scenario there is always going to be something you have to set that is specific to the user. I think the best approach might be to use the REST API to change the value of GetFile's directory after the user imported the template and set it to the user's input directory.

bbende · ‎08-15-2016

Is each user importing the template into a separate NiFi instance? or is there one NiFi instance with multiple users who all import the same template and want to retrieve files from different directories?

bbende · ‎08-15-2016

The Input Directory property of GetFile supports Expression Language so you can reference a system property like ${my.directory} and define my.directory in bootstrap.conf by adding another java arg like: java.arg.15=-Dmy.directory=/foo Then you can have different bootstrap files per environment.

bbende · ‎08-15-2016

The ListFile processor keeps track of files it has already seen and only picks up files where the modified date is newer than the last time the processor ran. ListFile produces a flow file for each path to fetch and is used with FetchFile to actually retrieve the file.

bbende · ‎08-13-2016

You can provide additional dependencies to the ExecuteScript processor by using the "Module Directory" property as described here: http://funnifi.blogspot.com/2016/02/executescript-using-modules.html You generally shouldn't put any jars into NiFi's lib directory because that can impact all other NARs.

bbende · ‎08-11-2016

Yes thats what i was trying to say about it being the name of an attribute, and not the attribute itself. When you put ${correlation.id} the framework evaluates that first, in your case it ends up being something like 20121021, and then MergeContent goes to look for an attribute called "20121021" which doesn't exist.

bbende · ‎08-11-2016

This answer is correct, just wanted to add additional clarification... The "Correlation Attribute Name" is not the actual value to correlate on, its the name of an attribute that has the value to correlate on. So as suggested, you could use an UpdateAttribute processor to create an attribute like: correlation.id = ${filename:substring(5,13)} Then in MergeContent put correlation.id as the value of Correlation Attribute Name.

Online	Offline
Last Visited	‎09-10-2020 01:23 PM

Member Since	‎09-29-2015 04:02 PM
Last Visited	‎09-10-2020 01:23 PM
Posts	871
Kudos received	709

Cloudera Community

Re: Using nifi registry in a nifi cluster.

Re: Is there a way to enable a stateful status upd...

Re: Automated Start/Stop of a NiFi Processor

Re: PublishKafkaRecord_0_10 1.2.0.3.0.1.1-5 Error:...

Re: how to configure mergecontent processor

Re: Can I access event provenance metadata using e...

Re: In nifi getkafka processor we can set concurre...

Re: How to Productionize Nifi and schedule it to r...

Re: How to dynamically supply directory attribute ...

Re: How to dynamically supply directory attribute ...

Re: How to dynamically supply directory attribute ...

Re: Nifi track files already processed?

Re: NoClassDefFoundError: net/sf/json/JSONObject

Re: How to Merge files together by file attribute ...

Re: How to Merge files together by file attribute ...