About bbende

bbende · ‎02-12-2018

Ok, how about ExecuteStreamCommand which accepts incoming flow files?

bbende · ‎01-19-2018

Based on all this info, it sounds like you have an identity mapping setup that maps your certificate identity like "CN=myuser, OU=xyz" to just "myuser". You can setup another identity mapping to handle kerberos identities... Something like this would map "myuser@myrealm" to "myuser" # nifi.security.identity.mapping.pattern.kerb=^(.*?)/instance@(.*?)$ # nifi.security.identity.mapping.value.kerb=$1

bbende · ‎01-16-2018

FetchFile should be used with ListFile to handle this.

bbende · ‎01-09-2018

Bin packing is the standard strategy that is used when merging together data, it just writes the bytes of each flow file one after another, inserting optional header, footer, and demarcators. Defragment strategy is for when you have previously used one of the "split" processors and want to undo the split back to a single flow file. This mode requires that all of the incoming flow files have the standard "fragment" attributes like fragment.identifier, fragment.index, and fragment.count which are created by the split processors.

charles_bradbur · ‎12-01-2017

All - just an update. I was able to get help resolving this on StackOverflow. See the post here: https://stackoverflow.com/questions/47399391/using-nifi-to-pull-elasticsearch-indexes?noredirect=1#comment82139433_47399391

bbende · ‎11-16-2017

You could probably implement a custom processor like "JsonToAttributes" very easily, iterate through a JSON document and add each field/value as an attribute to the flow file, or do it with a Groovy script in ExecuteScript. The reason it doesn't exist is that in the general case you should be careful about adding a significant amount of attributes because those are held in memory, so if we provided this processor people would start using it to adding 100s of attributes to their flow files which could lead to poor performance and memory issues.

mswenbin · ‎06-27-2019

Hi Bryan, Thanks for your inputs, that helped me understand the properties to setup with SASL_PLAINTEXT . I'm currently working on a project, that using NiFi publishKafka_0_10 processor with eventHub, from Microsoft doc ( https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-quickstart-kafka-enabled-event-hubs#send-and-receive-messages-with-kafka-in-event-hubs ) we need to map below configurations to the properties in publishKafka_0_10 processor bootstrap.servers={YOUR.EVENTHUBS.FQDN}:9093 security.protocol=SASL_SSL sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="{YOUR.EVENTHUBS.CONNECTION.STRING}"; I've tried to use SASL_PLAINTTEXT(as SSL is not an option in our test environment), and configured as bleow. However, it still cannot connect to eventHub, keep prompting me error "TimeoutException: Failed to update metadata after 5000 m" Can you please help review the properties I setup, perhaps there are something wrong in it, i've struggled on this few days, looking forward to your response. Thanks!

edjm1971 · ‎11-06-2018

A couple things. 1. I have no idea what Kafka is. 2. 30 minutes for 7 million records is great as my flow if 40 minutes for a meager 70k records. In regards to the above multi SplitText usage my question is regarding the Settings tab for the SplitText. How should it be set? My flow does not execute PutFile still until everything has gone through. I have 2 SplitTexts currently and am about to put in a 3rd to see if that helps but it is just slow processing of the data. Overall flow i have is Source -> SplitText (5000) -> SplitText (250) -> Processing -> PutFile Any tips greatly appreciated.

bbende · ‎10-26-2017

Here is the description of the Kafka properties from their source code... max.request.size The maximum size of a request. This is also effectively a cap on the maximum record size. Note that the server has its own cap on record size which may be different from this. This setting will limit the number of record batches the producer will send in a single request to avoid sending huge requests. buffer.memory The total bytes of memory the producer can use to buffer records waiting to be sent to the server. If records are sent faster than they can be delivered to the server the producer will either block or throw an exception based on the preference specified by <code>block.on.buffer.full</code>. This setting should correspond roughly to the total memory the producer will use, but is not a hard bound since not all memory the producer uses is used for buffering. Some additional memory will be used for compression (if compression is enabled) as well as for maintaining in-flight requests. For your case I don't think you really need to change either of these values from the defaults since you are sending 4Kb messages. Usually you would increase max.request.size if you have a single message that is larger than 1MB.

matej_puntar · ‎10-04-2017

Processor ConvertAvroToORC was using only 2 Concurrent Tasks although it was configured to use 4. After restarting the cluster ConvertAvroToORC started using 4 Concurrent Tasks and the throughput is now 14600 msg/sec on the cluster (7300 msg/sec on each machine).

Online	Offline
Last Visited	‎09-10-2020 01:23 PM

Member Since	‎09-29-2015 04:02 PM
Last Visited	‎09-10-2020 01:23 PM
Posts	871
Kudos received	709

Cloudera Community

Re: Using nifi registry in a nifi cluster.

Re: Is there a way to enable a stateful status upd...

Re: Automated Start/Stop of a NiFi Processor

Re: PublishKafkaRecord_0_10 1.2.0.3.0.1.1-5 Error:...

Re: how to configure mergecontent processor

Re: Nifi invokehttp processor to make web API call...

Re: Getting "No applicable policies could be found...

Re: trigger GetFile in nifi

Re: what is the difference between bin-packing alg...

Re: NiFi: Elasticsearch JSON to Parquet to be stor...

Re: Nifi: Dynamic Attribute Creation

Re: HDF: NiFi Kafka authentication via SASL_PLAINT...

Re: Ingesting a Big CSV file into Kafka using a mu...

Re: Significance and impact of Max Request Size on...

Re: NiFi converting json from Kafka to columnar OR...