Member since
07-30-2019
3471
Posts
1642
Kudos Received
1020
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 150 | 06-03-2026 06:06 PM | |
| 460 | 05-06-2026 09:16 AM | |
| 827 | 05-04-2026 05:20 AM | |
| 496 | 05-01-2026 10:15 AM | |
| 622 | 03-23-2026 05:44 AM |
08-06-2018
03:39 PM
@mojgan ghasemi - I recommend starting a new question for this question. This question was originally about tailFile and splitting files. It is best to keep one question per HCC post. - Thank you, Matt
... View more
08-02-2018
06:05 PM
3 Kudos
Have you ever noticed some lingering old rolled log files in your nifi logs directory that never seem to get deleted? This is a by-product of how logback works depending on how you have it configured. - Lets take a look at a default logback.xml configuration from NiFi: <appender name="APP_FILE">
<file>${org.apache.nifi.bootstrap.config.log.dir}/nifi-app.log</file> <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
<!--
For daily rollover, use 'app_%d.log'.
For hourly rollover, use 'app_%d{yyyy-MM-dd_HH}.log'.
To GZIP rolled files, replace '.log' with '.log.gz'.
To ZIP rolled files, replace '.log' with '.log.zip'.
-->
<fileNamePattern>${org.apache.nifi.bootstrap.config.log.dir}/nifi-app_%d{yyyy-MM-dd_HH}.%i.log</fileNamePattern>
<maxFileSize>100MB</maxFileSize>
<!-- Control the maximum number of log archive files kept and asynchronously delete older files -->
<maxHistory>30</maxHistory>
<!-- optional setting for keeping 10GB total of log files
<totalSizeCap>10GB</totalSizeCap>
-->
</rollingPolicy>
<immediateFlush>true</immediateFlush>
<encoder>
<pattern>%date %level [%thread] %logger{40} %msg%n</pattern>
</encoder>
</appender> The above app log configuration will log to a file named nifi-app.log. Once that file reaches either 100 MB in size or crest the top of the hour, it will be rolled. You may end up with numerous log files within a single hour if there is an excessive amount of logging occurring in your NiFi. - A "maxHistory" of 30 means that the logger will only keep 30 hours (HH) of rolled logs. But that is not the full story here with how logback works. Not only does it control the number of hours to keep but also controls the max age of logs to evaluate for deletion. So the log files being left around that are more then 30 hours in age would be ignored when deletion thread ran. - So this naturally raises the question of how did these files get left behind in the first place? Typically this occurs if the file crest say 30 hours old while the application is stopped. When the application is restarted those older files end up getting ignored. - While the application is continuously running this works as one would normally expect. To simply clean-up these older rolled log files, you could run a touch command on them so their system file timestamp updates so they are no longer more then 30 hours old. They will then be considered within the 30 hour window and be deleted once the "maxHistory" count reaches 30. - However, above is not a permanent solution. I recommend instead to control file deletion by "totalSizeCap" setting (commented out by default in the NiFi logback.xml) It offers a couple of advantages: 1. The "%i" option in the fileNamePattern says to create sequential numbered log files every "maxFileSize" (100MB) within each hour. This help prevent any one log from getting to large, but has the downside of not being considered by "maxHistory" as individually counted files. So "maxHistory" set to 15 is 15 hours of logs even if each hour contains 2000 100MB log files. So you can see under heavy logging you can end up using a lot of logs space. 2. "TotalSizeCap" will start deleting old rolled log files as long as the log file date is less then "maxHistory" age. So lets say we want to retain up to 100GB of log history. We would set "maxHistory" to some very large value like 8760 (~1 year of hours) and set "totalSizeCap" to 100GB. Provided you hot 100GB before your hit 8760 hours. - Here is an example configuration: <appender name="APP_FILE">
<file>${org.apache.nifi.bootstrap.config.log.dir}/nifi-app.log</file>
<rollingPolicy>
<!--
For daily rollover, use 'app_%d.log'.
For hourly rollover, use 'app_%d{yyyy-MM-dd_HH}.log'.
To GZIP rolled files, replace '.log' with '.log.gz'.
To ZIP rolled files, replace '.log' with '.log.zip'.
-->
<fileNamePattern>${org.apache.nifi.bootstrap.config.log.dir}/nifi-app_%d{yyyy-MM-dd_HH}.%i.log</fileNamePattern>
<maxFileSize>10MB</maxFileSize>
<!-- keepup to 8,760 hours worth of log files -->
<maxHistory>8760</maxHistory>
<!-- optional setting for keeping 100 GB total of log files -->
<totalSizeCap>100GB</totalSizeCap>
<!-- archive removal will be executed on appender start up -->
<cleanHistoryOnStart>true</cleanHistoryOnStart>
</rollingPolicy>
<immediateFlush>true</immediateFlush>
<encoder>
<pattern>%date %level [%thread] %logger{40} %msg%n</pattern>
</encoder>
</appender> - Of course there is always a chance you could hit 8,760 hours worth of logs before reaching 100 GB of generated app logs, so you may need to tailor these setting based on app log sizes being generated by your particular running NiFi.
... View more
Labels:
08-01-2018
04:54 PM
@Romain Guay - I a not sure i am following your comments completely. Keep in mind that this Article was written against Apache NiFi 0.x versions. The look of the UI and some of the configuration/capabilities relevant to RPGs have changed as of Apache NIFi 1.x. - When you say "source NiFi", are you referring to the NiFi instance with the RPG or the NiFi instance with an input or output port? - Keep in mind the following: 1. The NiFi with the RPG on the canvas is always acting as the client. It will establish the connection to the target instance/cluster. 2. An RPG added to the canvas of a NiFi cluster is running on every node in that cluster with no regard for any other node in the cluster. 3. An RPG regularly connects the target NiFi cluster to retrieve S2S details which include number of nodes, load on nodes, available remote input/output ports, etc... (Even if URL provided in RPG is of a single node in the target cluster, the details collected will be for all nodes in target cluster). 4. A node distribution strategy is calculated based on the details collected. - During the actual sending of FlowFiles to a target NiFi instance/cluster remote input port, the number of FlowFiles sent is based on configured port properties in the RPG. So it may be the case that those settings are default, so FlowFiles are not load-balanced very well. - During the actual retrieving of FlowFiles from a target NiFi instance/cluster remote output port, the RPG will round-robin the node in the target NiFi pulling FlowFiles from the remote output port based on the port configuration properties in the RPG. So it may be that one source node has an RPG that run before the others and connects and is allocated all FlowFiles on the target remote output port before any other node in source Nifi cluster runs. There are some limitations in load-balancing using such a get/pull setup. - For more info on configuring your remote ports via the RPG, see the following article: https://community.hortonworks.com/content/kbentry/109629/how-to-achieve-better-load-balancing-using-nifis-s.html *** above article is based off Apache NiFi 1.2+ versions of RPG. - Thanks, Matt
... View more
07-25-2018
01:20 PM
@Mohammad
Soori
- *** Forum tip: Please try to avoid responding to an Answer by starting a new answer. Instead use the "add comment" tp respond to en existing answer. There is no guaranteed order to different answers which can make following a response thread difficult especially when multiple people are trying to assist you. - Based on what you are showing me, your flow is working as designed. Since you have added all three outgoing relationships to the outgoing connection of the splitText processor, you would end up with duplication. - the "original" relationship is basically a passthrough for the incoming Flowfiles to splitText. This relationship is often auto-terminated unless you need to keep the original un-split flowfiles for something else in your flow. IN that case the original relationship would be routed within its own outbound connection and not in the same connection as "splits". - The fact that splitText is not really splitting your source Flowfiles (4 in and 4 out) tells me that the 4 source Flowfiles created do not contain any line returns from which to split that text. So the question is what does the output of one of these ~115 byte FlowFiles look like? - I also do not recommend routing the "failure" relationship along with "success" or "original" in the same connection. Should a failure occur, how would you easily separate what failed and what was successful. - Thank you, Matt
... View more
07-17-2018
06:29 PM
@Saikrishna Tarapareddy - The only NiFi configuration file you can edit that will take affect without requiring a NiFi restart is the logback.xml file. - As far as what is an acceptable search base, best to test your search base command on command line using ldapsearch. If it doesn't work there, it will not work in NiFi either. - Thank you, Matt - If you found this Answer addressed your original question, please take a moment to login and click "Accept" below the answer.
... View more
07-16-2018
08:11 PM
@Tommy - *** Forum tip: Please try to avoid responding to an Answer by starting a new answer. Instead use the "add comment" tp respond to en existing answer. There is no guaranteed order to different answers which can make following a response thread difficult especially when multiple people are trying to assist you. Also use the @username when replying to make sure user gets notified about your response. - You need to answer the question: What links these two FlowFiles to one another? - Since you are evaluating FlowFiles in pairs. What if you get two Files of type A. How do you want the processor to know what file type B belongs with which of the two type files that already arrived? - If this is not a concern you could use a simple wait/notify flow as described here to accomplish this: https://gist.github.com/ijokarumawak/375915c45071c7cbfddd34d5032c8e90 - Thanks, Matt
... View more
07-16-2018
07:59 PM
@mark juchems The ConsumeAzureEventHub processor was developed in the Apache community. From your description I did not realize it was growing non stop. It sounds like it was written in such away that is gets a thread upon initial execution and never releases that thread. If that is the case it will continue to produce FlowFiles to the output queue regardless of configured back pressure thresholds. - My suggestion would be to open an Apache Jira against that processor explaining the issue it is having and sharing your processor configuration. - Thank you, Matt
... View more
07-16-2018
07:37 PM
3 Kudos
@Mark Lin @mark juchems - The configurable backpressure thresholds (object and size) on a connection are soft limits. So a backpressure object threshold of 10,000 (default) means that the NiFi controller will not schedule the feeding processor to run if the object count has reached or exceeded 10,000 queued FlowFiles. - So lets say there are 9,999 queued objects. NiFi would allow the preceding processor to get scheduled. When that processor executes it code it will execute with no regard for destination queue sizes. That means if the execution of that processor thread results in 1,000,000 FlowFiles being processed in a single execution, all 1,000,000 FlowFiles will be added to that downstream connection queue. Now that the queue has 1,009,999 FlowFiles queued, the preceding processor will not be scheduled again until that queue drops below 10,000 again. - Same soft limit concept applies for the back pressure size threshold setting as well on a connection. - Thank you, Matt - When an "Answer" addresses/solves your question, please select "Accept" beneath that answer. This encourages user participation in this forum.
... View more
07-13-2018
01:23 PM
@Nikhil What I was getting at was that the authentication methods are different here. - I am assuming your users who access the NIFi UI via the load balancer are using a user/password authentication method? That method results in a token being issued to the authenticated user which is then passed by the client in every subsequent request to the NiFi API. - With Site-To-Site, there are no tokens involved in the authentication process since certificate authentication occurs via two-way TLS in every single rest api call. - Admittedly, I know nothing about your specific LB or how it is configured, so these are just suggested things to consider. - Also want to let you know you must be running an older HDF version. Newer versions support editing the URL string without needing to recreate the RPG. - Thank you, Matt
... View more
07-12-2018
12:46 PM
@Nikhil - *** Forum tip: Please try to avoid responding to an Answer by starting a new answer. Instead use the "add comment" tp respond to en existing answer. There is no guaranteed order to different answers which can make following a response thread difficult especially when multiple people are trying to assist you. - You get a verbose output form your keystore using the keytool command - keytool -v -list -keystore <keystore.jks file> - Look to see if your PrivateKeyEntry has any "ExtendedKeyUsages" listed. It would look something like this: #3: ObjectId: 2.5.29.37 Criticality=false
ExtendedKeyUsages [
clientAuth
serverAuth
] - Since you commented that the RPG works correctly when you use the URLs for the nodes directly, the certificates must support clientAuth then. This sounds more like a LB configuration issue. The certificate is being sent to the LB, but the LB is not forwarding that client cert on to the target end-point. - It is also not clear to me why you would configure your RPG to point at your LB instead of at one or more of the NiFi nodes directly? ----- The RPG will retrieve details about the entire target NiFi cluster when it connects and store/update that locally. So there really is no need for a LB in front of the RPG. - Thank you, Matt
... View more