Member since
07-30-2019
3248
Posts
1589
Kudos Received
951
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
68 | 04-25-2025 05:54 AM | |
56 | 04-25-2025 05:44 AM | |
98 | 04-24-2025 07:38 AM | |
104 | 04-24-2025 06:36 AM | |
68 | 04-23-2025 07:02 AM |
09-21-2016
03:39 PM
If the node that goes down happens to be the "primary node", the cluster will automatically elect a new "primary node" from the remaining available nodes and start those "primary node only" processors.
... View more
09-21-2016
11:51 AM
1 Kudo
@Gerd Koenig After a closer look at the jaas file you posted above, I believe you issue is because of a missing " in the following line: principal="nifi@REALM; This line should actually be: principal="nifi@REALM";
Try making the above change and restarting your NiFi. Thanks, Matt
... View more
09-20-2016
01:45 PM
@Gerd Koenig Can you try changing the value you have for "Message Delimiter" from "\n" to an actual new line in your PutKafka processor? You can add a new line by holding the Shift key while hitting enter. The result will appear as below: Thanks, Matt
... View more
09-20-2016
12:35 PM
2 Kudos
@Gerd Koenig The question here is are you running Apache NiFi 0.6 or HDF 1.2? I believe you are using Apache NiFi 0.6 which does not understand PLAINTEXTSASL as the security protocol. The Kafka 0.8 in HDP 2.3.2 and the Kafka 0.9 in HDP 2.3.4 use a custom Hortonworks Kafka client library. Kafka 0.8 in HDP 2.3.2 introduced support for kerberos before it was supported in the community. That support introduced the PLAINTEXTSASL security protocol. later when Apache Kafka 0.9 added kerberos support they used a different security protocol (SASL_PLAINTEXT). In order for HDF 1.2 to work with HDP 2.3.2, the GetKafka processor was modified from the Apache GetKafka to use that modified client library. Hortonworks again modified the client lib in HDP 2.3.4 for Kafka 0.9 so that it was backwards compatible and still supported the PLAINTEXTSASL security protocol. So bottom line here is that HDF 1.2 NiFi can talk kerberos to both HDP 2.3.2 (Kafka 0.8) and HDP 2.3.4 (Kafka 0.9), but Apache NiFi cannot. The NiFi consume and publish Kafka processor available in NiFi 0.7, NiFi 1.0, and HDF 2.0 do not use the Hortonworks custom Kafka client lib and can be used with Kafka 0.9 but not Kafka 0.8. You will need to use the SASL_PLAINTEXT security protocol with these new processors. Thanks, Matt
... View more
09-16-2016
07:53 PM
2 Kudos
@David Morris The nifi expression language can be used to route your data based on file extensions as you have described. When NiFi ingested data a NiFi FlowFile is created. That FlowFile is a combination of the original content and Metadata about that content. Upon ingest some metadata is created for every FlowFile. One of those attributes is named "filename" and contains the original filename of the ingested file.
The RouteOnAttribute can use the NiFi Expression Language to evaluate the Flowfile's "filename" attribute fro routing purposes:
In the RouteOnAttribute processor you would need to add new properties fro each file extension type you want to look for: Each one of those newly added properties become new relationships for that processor that can then be routed to follow-on processors as seen in the example above. Thanks, Matt
... View more
09-15-2016
02:28 PM
@Saikrishna Tarapareddy The purpose of using a RAID is to protect against the loss of a disk. If the intent here is to protect against a complete catastrophic loss of the system, there are somethings you can do. Keeping a backup of the conf directory will allow you to quickly restore the sate of your NiFi's dataflow. Restoring the state of your dataflow does not restore any data that may have been active in the system at the time of failure. The NiFi repos contain the following information: Database repository --> Contains change history to the graph (Keep record of all changes made on the canvas). If NiFi is secured, this repo also contains the users db. Loss if either of these has little impact. Loss of configuration history will not impact your dataflow or data. The users db is rebuilt from the authorized-users.xml file (located in conf dir by default) upon NiFi start. Provenance repository(s) --> Contains NiFi FlowFile lineage history. Loss of this repo will not affect your dataflow or data. You will simply be unable to perform queries against data that traversed the system previous to the loss. FlowFile repository --> Loss of this Repos will result in loss of data. The FlowFile repo keeps all attributes about Content currently in the dataflow. This includes where to find the actual content in the content repository(s). The information in this repo changes rapidly so backing up this repo is not really feasible. Raid offers your best protection here. Content repository(s) --> Loss of this repo will also result in loss of data and archived data (If configured to archive). The content repository(s) contain the actual content of the data NiFi processes. The data in this repo also changes rapidly as files are processed through the NiFi dataflow(s), so backing up this repo(s) is also not feasible. Raid offers your best protection here as well. As you can see recover from disk failure is possible with RAID; however, a catastrophic loss of the entire system will result in loss of the data that was currently in mid processing by any of the dataflows. Your Repos could be external attached storage. (There is likely to be some performance impact because of this; however, in the event of catastrophic server loss a new server could be stood-up using the backed-up conf dir and attached to the same external storage. This would help prevent data loss and allow processing to pickup where it left off. Thanks, Matt
... View more
09-14-2016
09:41 PM
If later you decide to add new disks you can simply cop[y your content repositories to those new disks and update the nifi.properties file repo config lines to point at the new locations.
... View more
09-14-2016
08:46 PM
1 Kudo
@Saikrishna Tarapareddy The section you are referring to is an example setup for a single server: CPU: 24 - 48 cores Memory: 64 -128 GB Hard Drive configuration: (1 hardware RAID 1 array) (2 or more hardware RAID 10 arrays) ** What falls between each "----------------" line is on a single mounted RAID/disk. A RAID can be broken up in to multiple logical volumes if desired. If it is, each / here represents a different logical volume. By creating logical volumes you can control how much disk space is reserved for each which is recommended. For example you would not want excessive logging to eat up space you want reserved for your flowfile-repo. Logical volumes allow you to control that by splitting up that single RAID into multiple logical volumes of a defined size. -------------------- RAID 1 array (This could also be a RAID 10) containing all the following directories/logical volumes:
-/ -/boot -/home -/var -/var/log/nifi-logs <-- point all your NiFi logs (logback.xml) here -/opt <-- install NiFi here under a sub-directory -/database-repo <-- point NiFi database repository here -/flowfile-repo <-- point NiFi flowfile repository here -------------------- 1st RAID 10 array logical volumes mounted as /cont-repo1
-/cont-repo1 <-- point NiFi content repository here --------------------- 2nd RAID 10 array logical volumes mounted as /prov-repo1 - /prov-repo1 <-- point NiFi provenance repository here --------------------- 3rd RAID 10 array logical volumes (recommended) mounted as /cont-repo2 - / cont-repo2 <-- point 2nd NiFI content repository here ---------------------- In order to setup the above example you would need 14 hard disks (2) Raid 1 (4) Raid 10 (x3) * You would only need 10 disks if you decided to have only one Raid 10 content repo disk( but it would need to be 2 TB) You could also take a large Raid 10 like the with prov-repo1 and split it into multiple logical volumes giving away part of that RAID's disk space to content repo. Not sure what you mean by "load 2TB of data for future project"? Are you saying you want NiFi to be able handle a queue backlog of 2TB of data? If that is the case each of your cont-repo Raid 10s would need to be at least 1TB in size. ***While the nifi.properties file has a single line for the content and and provenance repo path, multiple repos can be added by adding additional new lines to this file as follows:
nifi.content.repository.directory.default=/cont-repo1/content_repository
nifi.content.repository.directory.cont-repo2=/cont-repo2/content_repository nifi.content.repository.directory.cont-repo3=/cont-repo3/content_repository etc... nifi.provenance.repository.directory.default=./provenance_repository nifi.provenance.repository.directory.prov-repo1=/prov-repo1/provenance_repository nifi.provenance.repository.directory.prov-repo2=/prov-repo2/provenance_repository etc.... When more then one repo is defined in the nifi.properties file, NiFi will perform file based striping across them. This allows NiFi to spread out the I/O across multiple disk helping improve overall performance. Thanks, Matt
... View more
09-14-2016
12:55 PM
5 Kudos
@Pravin Battula As a third option you could also build a flow to create the delay you are looking for. This can be done using the UpdateAttribute and RouteOnAttribute processors. Here is an example that causes a 5 minute delay to all FlowFiles that pass through these two processors: The value returned by the now() function is the current epoch time in milliseconds. To add 5 minutes we need to add 300,000 milliseconds to the current time and store tat as a new attribute on the FlowFile. We then check that new attribute against the current time in the routeOnAttribute processor. If the current time is not greater then the delayed time, the FlowFile is routed to unmatched. So here the FlowFile would be stuck in this loop for ~5 minutes. You cab adjust the run schedule on the RouteOnAttribute processor to the desired interval you want to re-check the file(s). 0 sec is the default, but I would recommend changing that to at least 1 sec. Thanks, Matt
... View more