About scarroll

hckorkmaz01 · ‎11-20-2025

Hi everyone, I hope you’re doing well. I am working on a dataflow in Apache NiFi 1.18, and I need to retrieve the queue size information (flowfile count and content size) directly within NiFi itself, not via an external script or Postman. I know that the NiFi REST API provides this data, and I can access it successfully using external tools. However, my goal is to access queue metrics from inside NiFi, for example through processors like InvokeScriptedProcessor, QueryNiFiReportingTask, or any other built-in mechanism, without sending an external REST API request from outside NiFi. Is there a recommended approach, processor, or reporting task that allows NiFi to read its own queue sizes internally? If not, what would be the best practice to achieve this? Any guidance or examples would be greatly appreciated. Thank you in advance!

joshua_adeleke · ‎06-02-2017

thanks @Matt Clarke. Will downgrade asap.

scarroll · ‎05-24-2017

Perfect thanks!

scarroll · ‎05-12-2017

Thanks! Will try. Its still in the early stages so load is not a huge concern right now

scarroll · ‎05-15-2017

I'm not moving all directories to new places, but consolidating 8 locations to 3 - I wasn't sure how all the metadata and splits would copy over given some of the filenames are the same in each directory

MattWho · ‎04-27-2017

@Sebastian Carroll The intent of the automated SSL setup is to help users who do not have an external CA quickly build and deploy (in the case of Ambari) a valid keystore and truststore to their NiFi instance(s). Ambari simply uses the tls-toolkit as well but with some pre-defined parameters to automated the creation of the keystores and truststore for your Ambari deployed NiFi cluster. It is really not recommended to use the NiFi CA in Ambari in a production environment. Users encouraged to use a legitimate CA to sign certificates in a production. The reason for this is because their is no inherent trust of any certs signed by the NiFi CA and every install of a HDF NiFi cluster will have its own CA. So using being able to use things like NiFi's S2S protocol between systems deployed using different NiFi CA adds a lot of additional work/overhead since you must constantly update the CAs in every systems truststore. If I am understanding you correctly, you are asking for a way to tell Ambari to generate certificates and pass them to an external company CA to get them signed? Since Ambari has no control over an external CA and the credentials needed to sign certificate requests should be highly protected, I don't see a way to securely automate this entire process. The best that could be done is to Automate the creation of the self-signed certificate and certificate signing request. The user would still need to send that request to their company CA to be signed and the import the signed response once received back in to your keystore. Users would also still need to manually obtain the public key for their company CA in order to create or add it to a truststore. The problem with having Ambari auto-generate a certificate is that many companies have specific requirements for what specifically must be defined in servers certificate. Having Ambari provide all possible options sounds like overkill. I don't see why you could not use the NiFi tls-toolkit to generate a certificate that you could then get signed by your own CA. Again, I don't really see how NiFi could automate beyond creating the cert and signing request. If I am missing something here, please let me know. In Ambari based installs you do not need to use the NiFi CA to create certificates. Simply make sure the NiFi certificate authority is not installed. Then in the NiFi configs within NiFi, configure the SSL settings to point at the PKCS12 or JKS keystore and truststore you manually obtained via your company. The configs by default in Ambari expect that every node is using a keystore named the same (content of keystore shoudl be unique on each node)and that the keystores all use the same password. Thank you, Matt

scarroll · ‎01-25-2017

# Ruby Scripting in NiFi Today I was asked about an issue that I didn't know how to solve using NiFi. On the surface it sounded simple; just map an attribute to another value. The attribute to map was 'port' and based on the port number just add an attribute to more easily identify the system downstream. E.g. for port 10003; syslog:local, for 10004; syslog:db1, etc. After a bit of digging I found a few different options to solve this. ## Many UpdateAttribute Processors The first is to create a new UpdateAttribute processor for each incoming stream. This labels (places an attribute) on all files that come in from that listener. It looked like this: ![Multiple UpdateAttribute Processors]({{ site.baseurl }}/images/multiple_attribute_updates.png) This looks a bit confusing and tedious but is very precise and arguably easier to read, especially when we label the processors. It also has the added advantage of not having to change the port in more than one place. If for example, the local logs coming in over port 10002 need to change to port 10003, then we just make that change in the ListenSyslog processor and the rest remains unchanged. ## One UpdateAttribute Using the Advanced The advanced option allowed me to keep all the configuration in one place, easily mapping port numbers to tags. The two disadvantages I ran into were: 1. A fairly tedious process to get each mapping. It involved: * Create new rule * Add name * Search for existing rule to import * Change port number and associated label 2. Must now change the port in two different places if it were to change I would look like: ![Single UpdateAttribute with Advanced Features]({{ site.baseurl }}/images/single-updateattribute-with-advanced-features.png) ## ExecuteScript Processor This again allows me to keep all the configuration in one place and makes it much easier to make changes. I created a processor that stores the mappings in a hash and adds the correct attribute appropriately. It looks like so: ![ExecuteScript Processor for Mapping]({{ site.baseurl }}/images/executescript-processor-for-mapping.png) From the UI perspective it looks very similar to the single UpdateAttribute solution. This requires the addition of the script: {% highlight ruby %} map = { 10 => "system1", 9 => "system2", 8 => "system3", } map.default = "unknown" flowFile = session.get() return unless flowFile label = map[flowFile.getAttribute("port")] flowFile = session.putAttribute(flowFile, "system", label) session.transfer(flowFile, REL_SUCCESS) session.commit() {% endhighlight %} It is more complex by adding the need to understand a scripting language and also doesn't remove the requirement of changing the port number in more than one place. The script can add more complexity if it becomes necessary to reference as a file rather than using the 'Script Body' option in the processor. The main advantage is that it makes it easier to change the mapping - just copy and paste one of the lines of the hash and make changes all in one place. Given NiFi's goal of minimising the need for data flow managers to know how to code, it's unlikely this is the best approach. # Conclusion The first option is quite foreign to programmers who feel like it isn't generic. This is understandable given that it does feel a bit like copy and paste. I would say it is the most NiFi way of achieving the mapping as it is the solution which is most self-describing and resistant to change.

MattWho · ‎04-23-2019

@Sebastian Carroll There are other things that also exist in heap memory space within the NiFi JVM: Component Status History: NiFi will store status history data points for all processors on the NiFi canvas (including those that are stopped). You can see this stored status history by right clicking on a component and selecting "view status history". Each component has numerous stats for which these data points are retained. All these component status points are stored in heap memory. The number of points per each stat that is held in heap is controlled in the nifi.properties file: nifi.components.status.repository.buffer.size --> Specifies the buffer size for the Component Status Repository. The default value is 1440. nifi.components.status.snapshot.frequency --> This value indicates how often to present a snapshot of the components' status history. The default value is 1 min. so on every restart of NiFi these stats are gone since they are in heap only. Then over the course of default 24 hours (1440 minutes in 24 hours) the heap usage grows. You can reduce the heap usage by this status history by adjusting the above properties. (take snapshots less frequently, perhaps every 5 minutes. Reduce number of data points retained from 1440 to 380 or lower.) Templates: All uploaded templates (whether they are instantiated to canvas or not) are held in heap memory. you can reduce heap memory usage by deleting uploaded templates you will no longer be instantiating to canvas. Dataflow: Your entire flow is held in heap memory. The more components you have on the canvas the large heap footprint. Queued FlowFiles: Even if no processors are running, the FlowFile attributes for FlowFiles loaded in to each connection between processors are held in heap memory. (There is a swap threshold configurable in nifi.properties file which triggers a connection to start swapping to disk if number fo queued FlowFiles exceeds the configured swap threshold) Thank you, Matt

mburgess · ‎01-12-2017

Specifically, in your processor POM (which you list above), Bryan is saying that under that dependency you should have a <scope>provided</scope> line, and in your NAR POM you should include: <dependency> <groupId>org.apache.nifi</groupId> <artifactId>nifi-standard-services-api-nar</artifactId> <type>nar</type> </dependency>

scarroll · ‎01-03-2017

@Pierre Villard Thanks! I was hoping I had missed something in the API docs

Online	Offline
Last Visited	‎07-12-2017 08:56 AM

Member Since	‎06-06-2016 07:41 AM
Last Visited	‎07-12-2017 08:56 AM
Posts	38
Kudos received	14

Cloudera Community

Re: org.apache.nifi.StdErr Failed to start web ser...

Re: ClusterID getting out of sync with NameNode, S...

Re: Monitor Apache NiFi with Apache NiFi

Re: Unable to configure HBase JDBC driver on HDF 2...

Re: Multiple SSL endpoints with Kafka

Re: Node Decommissioning progressing too slowly

Re: Moving Kafka Log dirs through Ambari

Re: Is there a way to use automated NiFi SSL featu...

Ruby Scripting in NiFi

Re: NiFI unable to locate Journal files probably r...

Re: Maven nar plugin looks to not bundle all the d...

Re: Is it possible to get NiFi processor stats met...