About bbende

bbende · ‎08-11-2016

It should be in nifi-app.log... in the code it does: context.getBulletinRepository().addBulletin(bulletin); logger.warn(message); The logger is a standard SLF4J logger which ends up being logback controlled by the logback.xml in the conf directory: Logger logger = LoggerFactory.getLogger(MonitorDiskUsage.class);

bbende · ‎08-10-2016

In MergeContent there is a Delimiter Strategy, choose "Text" which means it uses the values type in to Header, Demarcator, and Footer. The Demarcator is what gets put between each FlowFile that is merged together. You can enter a new line with shift+enter.

bbende · ‎08-10-2016

It is still considered an unstable beta release so it is not recommended for production, but it is stable enough to run in a test/dev environment. Can't really say a specific timeline, but shouldn't be too far away. The community is already working on remaining issues and anything found from testing the beta.

bbende · ‎08-09-2016

In the case of SplitText the approach when splitting large files is to use two instances of SplitText, where the first one might split to 10-20k lines per flow file, and then the second splits down to 1 line. This avoids producing millions of flow files in one execution of the processor. For some other processors it is common for their description to include a warning statement if the processor is going to read in the whole flow file into memory so that the user is aware if they send in 2GB of data, its going to use 2GB of the heap, or create an OOM if its not available. Most processors whenever possible should perform their processing in a streaming fashion to avoid taking up large chunks of memory. As far as sharing the cluster among teams, NiFi doesn't really have resource isolation, but NiFi 1.0.0 (initial BETA released yesterday) is going to introduce is fine grained security model so that different teams and people can be granted access to different parts of the flow. Team1 might only have access to Process Group 1, and Team 2 might only have access to Process Group 2, so each team can't see what the other team is doing or change their flow.

bbende · ‎08-09-2016

By "reset on restart" I meant that they are held in memory so if the NiFi Java process restarts the counters are reset. Starting/stopping components on the graph do not impact the counters. We don't really do windowing operations... Counters are usually just some processor specific count that could be helpful for debugging/monitoring purposes. It is really just meant for someone to look at in the UI to figure out how something is working, but not really for the processor to retrieve the value later. In fact, I don't think there is any other processor API call besides adjustCounter, so all you can really do is increment. In a cluster I believe you should see the aggregated value of X for the whole cluster, it doesn't break it out for each node. One other point I forgot, is that behind the scenes it automatically keeps track of the aggregate count across instances of the same type of processor, and also for each instance. So if you had two ListenSyslog processors, you should see Messages Received for All Listen Syslog Processors, Messages Received for ListenSyslog #1, and Messages Received for ListenSyslog #2.

bbende · ‎08-09-2016

Counters are a way for a processor to track how many times some event occurred, mostly for monitoring purposes. There is a method in the ProcessSession: void adjustCounter(String name, long delta, boolean immediate); So calling this method with ("myCounter", 1, true) would increment the count of "myCounter" by 1, or create the counter if it didn't exist. Counters are not persistent and will be reset on restart. An example is in the syslog processors which increment a counter for each syslog message received.

bbende · ‎08-09-2016

The MergeContent processor can be used to merge JSON together and has a property called "Correlation Attribute Name" which when specified will merge together flow files that have the same value for the attribute specified. In your scenario you first need to use EvaluateJSONPath to extract "service" and "eventName" from the JSON document. Based on your sample JSON it seems like they are at the root level of the document so I believe something like: service = $.service eventName = $.eventName Then you need to get these two values into a single attribute, so you can use UpdateAttribute with something like: serviceEventName = ${service}/${eventName} Then in MergeContent set the "Correlation Attribute Name" to "serviceEventName". You can also specify the minimum group size and age so that you can merge together either 100MB or 1 hour worth of data.

bbende · ‎08-04-2016

Hi Stephanie, I'm actually not sure about that one, but I think it has more to do with the Kafka client and the number of partitions. You should be able to have a ConsumeKafka processor running on each node of your NiFi cluster and each pulling data without doing anything special. It might be good to start a new question about this specific problem with ConsumeKafka only consuming data on one node.

bbende · ‎08-04-2016

A common way to do this is to have the file written to ".filename" first and renamed to "filename" when done. This is why the GetFile processor File Filter property defaults to: [^\\.]\.* That regular expression says any filename that doesn't start with a period. I realize you may not have control over how the files are being written to the directory though, so this may not be an option if you can't control that.

bbende · ‎08-03-2016

Just to clarify, what Haimo mentioned is that the ConsumerKafka processor does not use any of NiFi's state management capabilities because the Kafka client maintains the offsets. Regarding the Kafka client... in 0.9.0 I believe it no longer stores offsets in ZooKeeper, and now stores them internally somehow, so that is why you see it connecting directly to the broker and not using ZooKeeper.

Online	Offline
Last Visited	‎09-10-2020 01:23 PM

Member Since	‎09-29-2015 04:02 PM
Last Visited	‎09-10-2020 01:23 PM
Posts	871
Kudos received	709

Cloudera Community

Re: Using nifi registry in a nifi cluster.

Re: Is there a way to enable a stateful status upd...

Re: Automated Start/Stop of a NiFi Processor

Re: PublishKafkaRecord_0_10 1.2.0.3.0.1.1-5 Error:...

Re: how to configure mergecontent processor

Re: Where are Nifi Stats Monitor disk usage logs w...

Re: Merge json events based on property

Re: Apache Nifi- How to add safeguards against hum...

Re: Apache Nifi- How to add safeguards against hum...

Re: Apache Nifi - What are Counters in Nifi?

Re: Apache Nifi - What are Counters in Nifi?

Re: Merge json events based on property

Re: NiFi GetKafka ZooKeeper Connection Error

Re: Nifi: How to check if large file has completel...

Re: Apache NIFI - What is the difference between C...