Member since
09-29-2015
871
Posts
723
Kudos Received
255
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4058 | 12-03-2018 02:26 PM | |
3040 | 10-16-2018 01:37 PM | |
4176 | 10-03-2018 06:34 PM | |
3021 | 09-05-2018 07:44 PM | |
2288 | 09-05-2018 07:31 PM |
02-12-2018
04:54 PM
2 Kudos
Ok, how about ExecuteStreamCommand which accepts incoming flow files?
... View more
01-19-2018
08:18 PM
Based on all this info, it sounds like you have an identity mapping setup that maps your certificate identity like "CN=myuser, OU=xyz" to just "myuser". You can setup another identity mapping to handle kerberos identities... Something like this would map "myuser@myrealm" to "myuser" # nifi.security.identity.mapping.pattern.kerb=^(.*?)/instance@(.*?)$ # nifi.security.identity.mapping.value.kerb=$1
... View more
01-09-2018
07:23 PM
1 Kudo
Bin packing is the standard strategy that is used when merging
together data, it just writes the bytes of each flow file one after
another, inserting optional header, footer, and demarcators. Defragment strategy is for when you have previously used one of the
"split" processors and want to undo the split back to a single flow
file. This mode requires that all of the incoming flow files have the
standard "fragment" attributes like fragment.identifier, fragment.index,
and fragment.count which are created by the split processors.
... View more
12-01-2017
04:59 PM
All - just an update. I was able to get help resolving this on StackOverflow. See the post here: https://stackoverflow.com/questions/47399391/using-nifi-to-pull-elasticsearch-indexes?noredirect=1#comment82139433_47399391
... View more
11-16-2017
02:58 PM
1 Kudo
You could probably implement a custom processor like "JsonToAttributes" very easily, iterate through a JSON document and add each field/value as an attribute to the flow file, or do it with a Groovy script in ExecuteScript. The reason it doesn't exist is that in the general case you should be careful about adding a significant amount of attributes because those are held in memory, so if we provided this processor people would start using it to adding 100s of attributes to their flow files which could lead to poor performance and memory issues.
... View more
06-27-2019
10:46 AM
Hi Bryan, Thanks for your inputs, that helped me understand the properties to setup with SASL_PLAINTEXT . I'm currently working on a project, that using NiFi publishKafka_0_10 processor with eventHub, from Microsoft doc ( https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-quickstart-kafka-enabled-event-hubs#send-and-receive-messages-with-kafka-in-event-hubs ) we need to map below configurations to the properties in publishKafka_0_10 processor bootstrap.servers={YOUR.EVENTHUBS.FQDN}:9093 security.protocol=SASL_SSL sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="{YOUR.EVENTHUBS.CONNECTION.STRING}"; I've tried to use SASL_PLAINTTEXT(as SSL is not an option in our test environment), and configured as bleow. However, it still cannot connect to eventHub, keep prompting me error "TimeoutException: Failed to update metadata after 5000 m" Can you please help review the properties I setup, perhaps there are something wrong in it, i've struggled on this few days, looking forward to your response. Thanks!
... View more
11-06-2018
06:09 PM
A couple things. 1. I have no idea what Kafka is. 2. 30 minutes for 7 million records is great as my flow if 40 minutes for a meager 70k records. In regards to the above multi SplitText usage my question is regarding the Settings tab for the SplitText. How should it be set? My flow does not execute PutFile still until everything has gone through. I have 2 SplitTexts currently and am about to put in a 3rd to see if that helps but it is just slow processing of the data. Overall flow i have is Source -> SplitText (5000) -> SplitText (250) -> Processing -> PutFile Any tips greatly appreciated.
... View more
10-26-2017
02:53 PM
Here is the description of the Kafka properties from their source code... max.request.size The maximum size of a request. This is also effectively a cap on the maximum record size. Note that the server has its own cap on record size which may be different from this. This setting will limit the number of record batches the producer will send in a single request to avoid sending huge requests. buffer.memory The total bytes of memory the producer can use to buffer records waiting to be sent to the server. If records are sent faster than they can be delivered to the server the producer will either block or throw an exception based on the preference specified by <code>block.on.buffer.full</code>.
This setting should correspond roughly to the total memory the producer will use, but is not a hard bound since not all memory the producer uses is used for buffering. Some additional memory will be used for compression (if compression is enabled) as well as for maintaining in-flight requests. For your case I don't think you really need to change either of these values from the defaults since you are sending 4Kb messages. Usually you would increase max.request.size if you have a single message that is larger than 1MB.
... View more
10-04-2017
01:57 PM
Processor ConvertAvroToORC was using only 2 Concurrent Tasks although it was configured to use 4. After restarting the cluster ConvertAvroToORC started using 4 Concurrent Tasks and the throughput is now 14600 msg/sec on the cluster (7300 msg/sec on each machine).
... View more