Member since
07-30-2019
3406
Posts
1622
Kudos Received
1008
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 113 | 12-17-2025 05:55 AM | |
| 174 | 12-15-2025 01:29 PM | |
| 117 | 12-15-2025 06:50 AM | |
| 243 | 12-05-2025 08:25 AM | |
| 405 | 12-03-2025 10:21 AM |
08-07-2018
11:03 AM
@yong
lau
- If you don't want to use a dataflow to redistribute your to be merged to a single node, the only other option you have is to control the delivery of the source data that is going to be merged to a single node. - You'll need to ask yourself: How are these files which you want to merge getting to your NiFi? Can that be controlled so this particular flow of data goes to one node in your cluster only? - Thanks, Matt
... View more
08-06-2018
06:56 PM
@Harish Vaibhav Kali - Once a processor is assigned a UUID that processor will keep that UUID unless you delete and re-add that processor. - All processors within a single NiFi cluster will be running an identical flow.xml.gz. - I am guessing you are creating templates and then importing these into different NiFi instances? Templates do not preserve UUIDs. Each time you instantiate a template, all the components will get a new UUID. - If you are trying to version control flows across multiple independent NiFi installations, you will want to take a look at using the nifi-registry for this. As of Apache NiFi 1.5.0, nifi-registry and version controlling flows were introduced. In the latest Apache NiFi releases with nifi-registry 0.2, these version controlled flows can even be pushed to Git. - Using nifi-registry will allow independent NiFi instances the ability to push and pull version controlled flows. - https://nifi.apache.org/registry.html - Thank you, Matt
... View more
08-06-2018
03:39 PM
@mojgan ghasemi - I recommend starting a new question for this question. This question was originally about tailFile and splitting files. It is best to keep one question per HCC post. - Thank you, Matt
... View more
08-06-2018
01:29 PM
@yong
lau
- The "Execution" processor configuration has nothing to do with FlowFiles at all. It simply controls whether the configured processor will be scheduled to run on every node or only the currently elected primary node. When a processor is scheduled to run, it will work against on those FlowFiles on incoming connection queues for that specific node. So if you have a processor configured for execution "Primary node" and there are FlowFiles queued on every node, only those FlowFiles on the primary node would get processed. - It is the role of the dataflow designer to construct a dataflow that routes all data to one node if creating a single FlowFile via merge is needed. Currently this can be accomplished using the postHTTP and ListenHTTP processors (These support sending FlowFiles (content plus FlowFile attributes)). The PostHTTP processor can be configured to send to a specific cluster node. So ideally you would build in to your Flow that routes these FlowFiles you need merged to a PostHTTP configured to "send as FlowFile" to one specific node in your cluster. On that node you have a listenHTTP processor that is the target of that PostHTTP processor which routes the received FlowFiles to your MergeContent processor. - There is work in progress to make this process a lot easier. The new capability which is on development will allow redistribution of FlowFiles via a connection configuration. There will be distribution strategies that can be used in this process like distributing all FlowFiles with matching criteria (like matching FlowFile Attribute) to same node. - Thank you, Matt - If you found this Answer addressed your original question, please take a moment to login and click "Accept" below the answer.
... View more
08-02-2018
06:05 PM
3 Kudos
Have you ever noticed some lingering old rolled log files in your nifi logs directory that never seem to get deleted? This is a by-product of how logback works depending on how you have it configured. - Lets take a look at a default logback.xml configuration from NiFi: <appender name="APP_FILE">
<file>${org.apache.nifi.bootstrap.config.log.dir}/nifi-app.log</file> <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
<!--
For daily rollover, use 'app_%d.log'.
For hourly rollover, use 'app_%d{yyyy-MM-dd_HH}.log'.
To GZIP rolled files, replace '.log' with '.log.gz'.
To ZIP rolled files, replace '.log' with '.log.zip'.
-->
<fileNamePattern>${org.apache.nifi.bootstrap.config.log.dir}/nifi-app_%d{yyyy-MM-dd_HH}.%i.log</fileNamePattern>
<maxFileSize>100MB</maxFileSize>
<!-- Control the maximum number of log archive files kept and asynchronously delete older files -->
<maxHistory>30</maxHistory>
<!-- optional setting for keeping 10GB total of log files
<totalSizeCap>10GB</totalSizeCap>
-->
</rollingPolicy>
<immediateFlush>true</immediateFlush>
<encoder>
<pattern>%date %level [%thread] %logger{40} %msg%n</pattern>
</encoder>
</appender> The above app log configuration will log to a file named nifi-app.log. Once that file reaches either 100 MB in size or crest the top of the hour, it will be rolled. You may end up with numerous log files within a single hour if there is an excessive amount of logging occurring in your NiFi. - A "maxHistory" of 30 means that the logger will only keep 30 hours (HH) of rolled logs. But that is not the full story here with how logback works. Not only does it control the number of hours to keep but also controls the max age of logs to evaluate for deletion. So the log files being left around that are more then 30 hours in age would be ignored when deletion thread ran. - So this naturally raises the question of how did these files get left behind in the first place? Typically this occurs if the file crest say 30 hours old while the application is stopped. When the application is restarted those older files end up getting ignored. - While the application is continuously running this works as one would normally expect. To simply clean-up these older rolled log files, you could run a touch command on them so their system file timestamp updates so they are no longer more then 30 hours old. They will then be considered within the 30 hour window and be deleted once the "maxHistory" count reaches 30. - However, above is not a permanent solution. I recommend instead to control file deletion by "totalSizeCap" setting (commented out by default in the NiFi logback.xml) It offers a couple of advantages: 1. The "%i" option in the fileNamePattern says to create sequential numbered log files every "maxFileSize" (100MB) within each hour. This help prevent any one log from getting to large, but has the downside of not being considered by "maxHistory" as individually counted files. So "maxHistory" set to 15 is 15 hours of logs even if each hour contains 2000 100MB log files. So you can see under heavy logging you can end up using a lot of logs space. 2. "TotalSizeCap" will start deleting old rolled log files as long as the log file date is less then "maxHistory" age. So lets say we want to retain up to 100GB of log history. We would set "maxHistory" to some very large value like 8760 (~1 year of hours) and set "totalSizeCap" to 100GB. Provided you hot 100GB before your hit 8760 hours. - Here is an example configuration: <appender name="APP_FILE">
<file>${org.apache.nifi.bootstrap.config.log.dir}/nifi-app.log</file>
<rollingPolicy>
<!--
For daily rollover, use 'app_%d.log'.
For hourly rollover, use 'app_%d{yyyy-MM-dd_HH}.log'.
To GZIP rolled files, replace '.log' with '.log.gz'.
To ZIP rolled files, replace '.log' with '.log.zip'.
-->
<fileNamePattern>${org.apache.nifi.bootstrap.config.log.dir}/nifi-app_%d{yyyy-MM-dd_HH}.%i.log</fileNamePattern>
<maxFileSize>10MB</maxFileSize>
<!-- keepup to 8,760 hours worth of log files -->
<maxHistory>8760</maxHistory>
<!-- optional setting for keeping 100 GB total of log files -->
<totalSizeCap>100GB</totalSizeCap>
<!-- archive removal will be executed on appender start up -->
<cleanHistoryOnStart>true</cleanHistoryOnStart>
</rollingPolicy>
<immediateFlush>true</immediateFlush>
<encoder>
<pattern>%date %level [%thread] %logger{40} %msg%n</pattern>
</encoder>
</appender> - Of course there is always a chance you could hit 8,760 hours worth of logs before reaching 100 GB of generated app logs, so you may need to tailor these setting based on app log sizes being generated by your particular running NiFi.
... View more
Labels:
08-02-2018
01:05 PM
@Harish Vaibhav Kali - This thread is kinda moving off topic form the original question which has been answered. It is probably best to start a new question. - That being said, what you are showing me looks to be correct functionality provided the following is true: ---> When NiFi was started for the first time there was no pre-existing flow.xml.gz. For a brand new secure flow, providing the "Initial Admin Identity" gives that user access to get into the UI and to manage users, groups and policies. But if that user wants to start modifying the flow, they need to grant themselves policies for the root process group. The system is unable to do this automatically because in a new flow the UUID of the root process group is not permanent until the flow.xml.gz is generated. If the NiFi instance is an upgrade from an existing flow.xml.gz or a 1.x instance going from unsecure to secure, then the "Initial Admin Identity" user is automatically given the privileges to modify the flow. - Also keep in mind that if the users.xml and authorizations.xml files do not exist and you have configured both the "initial admin Identity" and provided a legacy "authorized-users.xml" file, Nifi will fail to start. That is because the initial seeding of the users.xml and authorizations.xml files can be done via one or the other, but not both. - Thank you, Matt
... View more
08-02-2018
12:44 PM
@Seongmin Park - The log is telling you that authentication for your login user "admin" was successful; however, the authorization for that user was not. - There is nothing that stands out to me in your basic authorizers.xml file configuration. So my thought here is that this is not the original configuration of the authorizers.xml file. The file-provider is used to initially generate the users.xml and authorizations.xml files. Once these files exist they will not be re-generated or modified if you later make changes to this configuration xml. Basically if the users.xml and authorizations.xml files already exist, the file-provider will do nothing. - I suggest taking a look at what is currently in your users.xml and authorizations.xml files. My guess here is that you will find that a user entry does not exist for "admin" in the users.xml file. - If you remove or rename these two files and restart your NiFi instance, the authorization will build new versions of these files based on the current configuration in your authorizers.xml file. - Thank you, Matt - If you found this Answer addressed your original question, please take a moment to login and click "Accept" below the answer.
... View more
08-01-2018
09:21 PM
@Harish Vaibhav Kali - The authorization policies are very granular. Setting policies to view and modify component and view and modify data on a process group will be inherited by all components and sub-process groups created within that process group. - Keep in mind that the root level canvas you see when you log in to a new install is just another process group (root process group). - Applicable polices must be granted to the logged in user in order for that user to perform the desired action. Admin user does not need policies to specific flow components in order for other users to perform actions based on authorizations those other users have been granted.
... View more
08-01-2018
04:54 PM
@Romain Guay - I a not sure i am following your comments completely. Keep in mind that this Article was written against Apache NiFi 0.x versions. The look of the UI and some of the configuration/capabilities relevant to RPGs have changed as of Apache NIFi 1.x. - When you say "source NiFi", are you referring to the NiFi instance with the RPG or the NiFi instance with an input or output port? - Keep in mind the following: 1. The NiFi with the RPG on the canvas is always acting as the client. It will establish the connection to the target instance/cluster. 2. An RPG added to the canvas of a NiFi cluster is running on every node in that cluster with no regard for any other node in the cluster. 3. An RPG regularly connects the target NiFi cluster to retrieve S2S details which include number of nodes, load on nodes, available remote input/output ports, etc... (Even if URL provided in RPG is of a single node in the target cluster, the details collected will be for all nodes in target cluster). 4. A node distribution strategy is calculated based on the details collected. - During the actual sending of FlowFiles to a target NiFi instance/cluster remote input port, the number of FlowFiles sent is based on configured port properties in the RPG. So it may be the case that those settings are default, so FlowFiles are not load-balanced very well. - During the actual retrieving of FlowFiles from a target NiFi instance/cluster remote output port, the RPG will round-robin the node in the target NiFi pulling FlowFiles from the remote output port based on the port configuration properties in the RPG. So it may be that one source node has an RPG that run before the others and connects and is allocated all FlowFiles on the target remote output port before any other node in source Nifi cluster runs. There are some limitations in load-balancing using such a get/pull setup. - For more info on configuring your remote ports via the RPG, see the following article: https://community.hortonworks.com/content/kbentry/109629/how-to-achieve-better-load-balancing-using-nifis-s.html *** above article is based off Apache NiFi 1.2+ versions of RPG. - Thanks, Matt
... View more
07-31-2018
02:35 PM
@Veerendra Nath Jasthi Most likely your GetFile processor is trying to perform a listing for a very large number of source files? If that is the case, this listing may take a considerable amount of time before FlowFiles even begin to be generated. - Stopping a processor only tells the controller to no longer schedule that processor to run. Any currently executing threads will continue to run till completion.
... View more