Member since
07-30-2019
3406
Posts
1622
Kudos Received
1008
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 113 | 12-17-2025 05:55 AM | |
| 174 | 12-15-2025 01:29 PM | |
| 118 | 12-15-2025 06:50 AM | |
| 244 | 12-05-2025 08:25 AM | |
| 405 | 12-03-2025 10:21 AM |
09-18-2018
12:51 PM
1 Kudo
@Wojtek I believe you are misunderstanding how the UpdateAttribute processor functions. - Each new property you add expects a property name (this becomes name of attribute being created or updated) and a value. The value can be a string or a NiFi Expression Language (EL) statement. - In your screenshot above you have created EL statements. For example Property = bytes Value = ${bytes} What the above will actually do is: - The EL statement "${bytes}" tells NiFi to try to locate an NiFi attribute (At no time does the updateAttribute processor read the content of the FlowFile) with a property name of bytes and return its assigned value. That returned value will then be used to change the existing value assigned to the FlowFile attribute "bytes" or create a new FlowFile attribute with property name "bytes" and assign the value to that. - NiFi searches for NiFi Attributes in the following hierarchy: 1. NiFi checks all existing attributes assigned already to FlowFile being processed. 2. NiFi checks all in scope process group variables 3. NiFi checks the NIFi variable registry file 4. NiFi checks the NiFi JVM properties 5. NiFi checks the NIFi user system environment variables. - Since the attribute "bytes" does not exsit in any of these places you are ending up with no value or empty string values being set on all these new properties. - Since you are trying to extract values from the content and assign those to FlowFile attributes, you will want to use a different processor. Perhaps ExtractText instead. - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.
... View more
09-14-2018
01:47 PM
@sri chaturvedi yes, potentially if there are enough inbound FlowFiles to trigger processor to run 4 times concurrently.
... View more
09-12-2018
12:53 PM
2 Kudos
@sri chaturvedi - You are only going to benefit from setting run duration to 50ms if the processing of each incoming FlowFile to the putSQL processor is taking fractions of the 50ms duration. Details on "Run duration" and how it works can be found here: https://community.hortonworks.com/articles/221807/understanding-nifi-processors-run-duration-functio.html --------- When you set a run duration on a lot of processors, when those threads are executed, they will consume that CPU thread for possibly longer then needed. This means that other processors may end up waiting longer for a thread. - Consider this example: Your putSQL happens to be taking 10 ms to execute the put of a FlowFile. That means that with a 50 ms run duration it would put 5 FlowFiles within the single thread execution. What happens if incoming connection queue only has 1 FlowFile at time of execution? The processor holds that thread for 40 ms longer then needed. That is 40 ms of cpu time not available to another processor. - Since there is some time overhead in staring and stopping threads, run duration is very useful when you have a high sustained dataflow. It can actually decrease performance when used in dataflow where there is not high volume of FlowFiles (High volume is relative to the processors designed task.) - ---------- When it comes to concurrent tasks, this dictates parallel processor execution. - Since your canvas has 2000 processors, you need to understand that all these processor cannot execute at the exact same time. There is only so much CPU available and NiFi has a configureable thread pool size. This means that many processors may be just waiting in line for their chance to to get time on the CPU. - Details on processor Concurrent task setting recommendations can be found here: https://community.hortonworks.com/articles/221808/understanding-nifi-max-thread-pools-and-processor.html - ----------- - You also mentioned NiFi UI slowness. It may be related to nothing above: https://community.hortonworks.com/articles/184786/hdfnifi-improving-the-performance-of-your-ui.html - Thank you, Matt - If you found this answer addressed your question, please take a moment to login in and click the "ACCEPT" link.
... View more
09-12-2018
12:50 PM
4 Kudos
# Max Timer Driven Thread Count and Max Event Driven Thread Count: - Out of the box NiFi sets the Max Timer Thread Counts relatively low to support operating on the simplest of hardware. This default setting can limit the performance of very large high volume dataflow that must perform a lot of concurrent processing. General guidance for setting this value is 2 - 4 times the number of cores available to the hardware on which the NiFi service is running. With a NiFi cluster where each server has different hardware (not recommended), this would be set to the highest value possible based on the server with the fewest cores. NOTE: Remember that all configurations you apply within the NIFi UI are applied to every node in a NiFi cluster. None of the settings apply as a total to the cluster itself. NOTE: The cluster UI can be used to see how the total active threads are being used per node. Closely monitoring system CPU usage over time on each of your cluster nodes will help you identify regular or routine spikes in usage. This information will help you identify if you can increase the “Maximum Timer Driven Thread Count” setting even higher. Just arbitrarily setting this value higher can lead to thread spending excessive time in CPU wait and not really doing any work. This can show as long tasks times reported in processors. - *** The Event Driven scheduling strategy is considered experimental and thus do not recommend that it is used at all. User should only be configuring their NiFi processors to use one of the Timer Driven scheduling strategies (Timer Driven or CRON Driven). - # Assigning Concurrent Tasks to processor components: - Concurrent task settings on processors should always start at default 1 and only be incremented slowly as needed. Assigning too many concurrent tasks to every processor can have an affect on other dataflows/processors. - Because of how the above works, it may appear to a user that they get better performance out of a processor by simply setting a high number of concurrent tasks. What they are really doing is simply stacking up more request in that large queue so the processor gets more opportunity to grab one of the available threads from the resource pool. What often happens is users with a processor only running with 1 concurrent task are affected (simply because of the size of the request queue). So that user increases their concurrent tasks. Before you know it the request queue is so large, no one is benefiting from assigning additional concurrent tasks. - In addition you may have processors that inherently have long running tasks. Assigning these processors lots of concurrent tasks can mean a substantial chunk of that thread pool is being used for an extended amount of time. This then limits the number of available threads from the pool that are trying to work through the remaining tasks in the queue.
... View more
Labels:
09-12-2018
12:30 PM
6 Kudos
# Processor Run Duration: Some processors support configuring a run duration. This setting tells a processor to continue to use the same task to work on as many FlowFiles (or batches of flowfiles) from an incoming queue in a single task. This is ideal for processors where the individual tasks themselves are completed very fast and the volume of FlowFile are large as well. In the above example, the exact same feed of FlowFiles were passed to both these processors which are configured to perform the same Attribute updated. Both processed the same number of FlowFiles in the past 5 minutes; however, the processor configured with a run duration consumed less overall CPU time to do so. Not all processors support setting a run duration. The nature of the processor function, the methods being used, and/or client lib used may not support this capability. You will not be able to set a run duration on such processors. How this works: Processor has thread assigned to its task. Processor grabs highest priority FlowFile (or batch of FlowFiles) from the “active queue” of the incoming connection. If processing of the FlowFile(s) does not exceed the configured run duration, another FlowFile (Flowfile batch) is pulled from the active queue. This process continues all under that same thread until run duration has been reached or “Active queue” is empty. At that time the session is completed and all outbound FlowFiles are committed at once to the appropriate relationship. Since no FlowFiles are committed until the entire run completes, Some latency is introduced on the FlowFiles. Your configured run duration dictates how much latency will occur at a minimum. If the execution of the processor against a FlowFile takes longer then the configured "run duration", there is no added benefit of adjusting this configuration. What this means for heap usage: Since it is only processing incoming FlowFiles in the “Active queue” there is no added heap pressure here. (FlowFiles in “active queue “ are already in heap space). The FlowFiles being generated (if any, depending on processor function) are all held in heap until the final commit. This may introduce some additional heap pressure versus not using a run duration since all those new FlowFiles being generated will be held in heap until they are all commited to an output relatiosnhip at the end of the run duration.
... View more
Labels:
09-05-2018
01:05 PM
1 Kudo
You are also going to want to make sure you have set the following nifi.properties file properties: nifi.web.http(s).host= leaving this blanks means Java will try to determine hostname which may result in an incorrect hostname being determined or Java just using localhost instead of the actual server hostname. - Then you want to make sure whatever hostname you enter here is properly resolvable to a valid IP for your server.
... View more
09-05-2018
12:57 PM
1 Kudo
@Hariprasanth Madhavan - There are two separate processes that are associated to Nifi on startup. The bootstrap service starts and listens for requests from the second nifi process (main application). The Exception above leads me to believe that the second process is unable to communicate with the bootstrap process listening on port 24264. This is likely because it is trying to communicate with localhost:24264 and you so not have an entry in your server's /etc/hosts file resolving localhost to any address (for example: 127.0.0.1 localhost) - Thank you, Matt - If you found this Answer addressed your original question, please take a moment to login and click "Accept" below the answer.
... View more
09-05-2018
12:43 PM
@Surendra Shringi I am really not sure what NiFi version you are running. There is no version 16.6. I apologize if i am not following your questions completely. If you shutdown a system, nothing will be running on that system. - Are you asking if NiFi can be setup so that on system reboot/startup the NiFi service is auto-started? If that is the case, then the answer is yes. https://nifi.apache.org/docs/nifi-docs/html/getting-started.html#installing-as-a-service - Thank you, Matt - If you found this Answer addressed your original question, please take a moment to login and click "Accept" below the answer.
... View more
09-04-2018
08:36 PM
@Josh Nicholson Your regex looks correct. The question is what is actually coming back and being passed to that regex. Have you looked at the authentication output logged in the nifi-user.log.? What is logged when one of these users log in to NiFi?
... View more
09-04-2018
03:27 PM
@Josh Nicholson
I am assuming you are using ldap-provider for user authentication? If so, what value do you have assigned to the following property in your login-identity-providers.xml file: <property name="Identity Strategy"></property> I suspect you may have this set to USE_USERNAME? - If so, upon successful authentication of the user, the username entered by user on login screen is going to be passed through the mapping patterns and the result sent to Ranger for authorization verification rather then the ldap entry DN. - Thanks, Matt - If you found this Answer addressed your original question, please take a moment to login and click "Accept" below the answer.
... View more