Member since
09-29-2015
871
Posts
721
Kudos Received
255
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2635 | 12-03-2018 02:26 PM | |
1728 | 10-16-2018 01:37 PM | |
3121 | 10-03-2018 06:34 PM | |
1861 | 09-05-2018 07:44 PM | |
1468 | 09-05-2018 07:31 PM |
12-01-2017
04:59 PM
All - just an update. I was able to get help resolving this on StackOverflow. See the post here: https://stackoverflow.com/questions/47399391/using-nifi-to-pull-elasticsearch-indexes?noredirect=1#comment82139433_47399391
... View more
11-16-2017
02:58 PM
1 Kudo
You could probably implement a custom processor like "JsonToAttributes" very easily, iterate through a JSON document and add each field/value as an attribute to the flow file, or do it with a Groovy script in ExecuteScript. The reason it doesn't exist is that in the general case you should be careful about adding a significant amount of attributes because those are held in memory, so if we provided this processor people would start using it to adding 100s of attributes to their flow files which could lead to poor performance and memory issues.
... View more
06-27-2019
10:46 AM
Hi Bryan, Thanks for your inputs, that helped me understand the properties to setup with SASL_PLAINTEXT . I'm currently working on a project, that using NiFi publishKafka_0_10 processor with eventHub, from Microsoft doc ( https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-quickstart-kafka-enabled-event-hubs#send-and-receive-messages-with-kafka-in-event-hubs ) we need to map below configurations to the properties in publishKafka_0_10 processor bootstrap.servers={YOUR.EVENTHUBS.FQDN}:9093 security.protocol=SASL_SSL sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="{YOUR.EVENTHUBS.CONNECTION.STRING}"; I've tried to use SASL_PLAINTTEXT(as SSL is not an option in our test environment), and configured as bleow. However, it still cannot connect to eventHub, keep prompting me error "TimeoutException: Failed to update metadata after 5000 m" Can you please help review the properties I setup, perhaps there are something wrong in it, i've struggled on this few days, looking forward to your response. Thanks!
... View more
11-06-2018
06:09 PM
A couple things. 1. I have no idea what Kafka is. 2. 30 minutes for 7 million records is great as my flow if 40 minutes for a meager 70k records. In regards to the above multi SplitText usage my question is regarding the Settings tab for the SplitText. How should it be set? My flow does not execute PutFile still until everything has gone through. I have 2 SplitTexts currently and am about to put in a 3rd to see if that helps but it is just slow processing of the data. Overall flow i have is Source -> SplitText (5000) -> SplitText (250) -> Processing -> PutFile Any tips greatly appreciated.
... View more
10-26-2017
02:53 PM
Here is the description of the Kafka properties from their source code... max.request.size The maximum size of a request. This is also effectively a cap on the maximum record size. Note that the server has its own cap on record size which may be different from this. This setting will limit the number of record batches the producer will send in a single request to avoid sending huge requests. buffer.memory The total bytes of memory the producer can use to buffer records waiting to be sent to the server. If records are sent faster than they can be delivered to the server the producer will either block or throw an exception based on the preference specified by <code>block.on.buffer.full</code>.
This setting should correspond roughly to the total memory the producer will use, but is not a hard bound since not all memory the producer uses is used for buffering. Some additional memory will be used for compression (if compression is enabled) as well as for maintaining in-flight requests. For your case I don't think you really need to change either of these values from the defaults since you are sending 4Kb messages. Usually you would increase max.request.size if you have a single message that is larger than 1MB.
... View more
10-04-2017
01:57 PM
Processor ConvertAvroToORC was using only 2 Concurrent Tasks although it was configured to use 4. After restarting the cluster ConvertAvroToORC started using 4 Concurrent Tasks and the throughput is now 14600 msg/sec on the cluster (7300 msg/sec on each machine).
... View more
09-19-2017
02:09 PM
@sally sally By setting your minimums (Min Num Entries and Min Group Size to some large value), FlowFiles that are added to a bin will not qualify for merging right away. You should then set "Max Bin Age" to a unit of time you are willing to allow a bin to hang around before it is merged regardless of the number of entries in that bin or that bins size. As far as the number of bins go, a new bin will be created for each unique filename found in the incoming queue. Should the MergeContent processor encounter more unique filenames then there are bins, the MergeContent processor will force merging of the oldest bin to free a bin for the new filename. So it is important to have enough bins to accommodate the number of unique filenames you expect to pass through this processor during the configured "max bin age" duration; otherwise, you could still end up with 1 FlowFile per merge. Thanks, Matt
... View more
09-20-2017
01:33 PM
If you want to have a default value of "null" then the type of your field needs to be a union of null and the real type. For example, for timestamp you would need: "type": ["long", "null"]
... View more
09-05-2017
04:36 AM
I found my problem, using ${event_type} in the Correlation Attribute Name where it should just be event_type. All is sorted now thanks a lot for the help!
... View more
09-01-2017
04:56 PM
Issue was browser version related. Switching to a newer version of the browser resolved this issue.
... View more