Member since
07-30-2019
3131
Posts
1564
Kudos Received
909
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
103 | 01-09-2025 11:14 AM | |
653 | 01-03-2025 05:59 AM | |
393 | 12-13-2024 10:58 AM | |
423 | 12-05-2024 06:38 AM | |
356 | 11-22-2024 05:50 AM |
03-15-2016
07:33 PM
1 Kudo
for you scenario with 12 disks (assuming all disk are 200 GB)
You can specify/define multiple Content repos and multiple Provenance repos; however, you can only define one FlowFile repository and one database repository.
- 8 disks for Content repos:
- /cont_repo1 <-- 200 GB
- /cont_repo2 <-- 200 GB
- /cont_repo3 <-- 200 GB
- /cont_repo4 <-- 200 GB
- /cont_repo5 <-- 200 GB
- /cont_repo6 <-- 200 GB
- /cont_repo7 <-- 200 GB
- /cont_repo8 <-- 200 GB
- 2 disks for Provenance repos:
- /prov_repo1 <-- 200 GB
- /prov_repo2 <-- 200 GB
- 1 disk split into multiple partitions for:
- /var/log/nifi-logs/ <-- 100 GB
-
OS partitions <-- split amongst other Standard OS (/tmp, /, etc...)
- 1 disk split into multiple partitions for:
- /opt/nifi <-- 50 GB
- /flowfile_repo/ <-- 50 GB
- /database_repo/ <-- 25 GB
- /opt/configuration-resources <-- 25 GB (this will hold any certs, config files, extras your NiFi processors/ dataflows may need).
... View more
03-15-2016
07:23 PM
6 Kudos
There is no direct correlation between the size of the content repository and the provenance repository. The size the content repository will grow to is directly tied to the amount of unique content that is currently queued on the NiFi canvas. If archive is enabled the amount of content repository space consumed will depend on the archive configuration settings in the nifi.properties file. nifi.content.repository.archive.max.retention.period=12 hours
nifi.content.repository.archive.max.usage.percentage=75% nifi.content.repository.archive.enabled=true As you can see from the above archive will try to retain 12 hours of archived content (archived content being content that is no longer associated to an existing queued FlowFile on within any dataflow on the graph. This does not guarantee that there will be any archive or that the content repository will not grow beyond 75% disk utilization. Content still actively associated to queued FlowFiles will remain in the Content repository. So it is important to build in back pressure in to dataflows where there is concern that large backlogs could trigger disk to fill to 100%. Should Content repo fill to 100% corruption will not occur. New FlowFiles will not be able to be created until free space is available. This is likely to produce a lot of errors in the flow (anywhere content is modified/written). Provenance repository size is directly related to the number of FlowFiles and the number of event generating processors those events pass through on the NiFi canvas. In the case of disk utilization here, it is very controlled by setting in the nifi.properties file: nifi.provenance.repository.max.storage.time=7 days nifi.provenance.repository.max.storage.size=50 GB With the above settings, NiFi will try to retain 7 days of provenance events on every FlowFile that it processes, but will start rolling off the oldest events once the max storage exceeds 50 GB. It is important to understand that the 75% and 50GB are soft limits and should never be set to 100% or the exact size of the disk. FlowFile Repository and database repository each remain relatively small. The FlowFile repository is the most important repo if all. It should be isolated on a separate disk/partition that is not shared with any other process that may fill it. allowing the FlowFile repository disk to fill to 100% can lead to database corruption and lost data. for a 200 GB Content repository, a ~25 GB FlowFile repo should be enough. The database repository contains the user and change history DBs. The user db will remain 0 bytes in size for NiFi instances running http (non-secure). For those instances running https (Secure), the user db will track all users who log in to the UI. The change history db is tied to the little clock icon in the upper right corner NiFi tool bar. It keeps track of all changes made on the NiFi graph/canvas. It also stays relatively small. A few GB of space should be plenty to store a considerable number of changes.
... View more
03-15-2016
12:05 PM
1 Kudo
@Lubin Lemarchand you are correct. Thank you for filling in the details.
... View more
03-14-2016
03:52 PM
3 Kudos
Here is a basic sizing chart for HDF: *** But you must keep in mind that these requirements may grow depending on what processors you use in your dataflow. Memory need is often one that grows quicker then CPU need. *** Also understand that these sizing scenarios are based upon setting up your NiFi instance(s) per the best practice documentation provided.
... View more
03-14-2016
01:28 PM
1 Kudo
Shishir, I agree that you should be carefully reviewing all the documented links provided by Artem Ervits, but you also need to understand the loading behavior of any given NiFI instance is directly tied to what processors are being used. While some processors exhibit little impact to CPU and/or memory, others can impact those things significantly. Capacity planning needs to take in to consideration the dataflows you want to run. What kind of data content manipulation you want to do (MergeContent, SplitContent, ReplaceContent, etc...), data sizes and volumes, how many NiFi nodes and how you plan to distributed the data load, etc...
... View more
03-09-2016
12:42 PM
5 Kudos
I am assuming you are using the InvokeHTTP processor and that you want to use one of the new attributes created on your FlowFile in response to the request for adding to the content of the same Flowfile. You will want to make sure you have the "Put Response Body in Attribute" property configured in the InvokeHTTP processor. You can then use the ReplaceText processor with an Evaluation Mode of Entire text and Replacement Strategy of Append. This will allow you to write a NiFi Expression Language statement that uses the attribute you specified for the response body containing the return and append it to your original json content.
... View more
02-16-2016
04:27 PM
1 Kudo
@cokorda putra susila NiFi already includes the HDFS core libraries. So no need to install Hadoop on the NiFi server. Just need to the config files (i.e - core-site.xml) as Artem suggests.
... View more
02-16-2016
03:41 PM
9 Kudos
The purpose of this article is to provide the steps
needed to create your own certificates for securing your NiFi instance(s). The
article will also cover creating your own Certificate Authority (CA) that you
can use to sign all the certificates you create. This article is not intended to be a best practices guide to creating secure keys. While we will provides tips, users should carefully research the various security options available when creating keys. This procedure assumes you have Java Keytool
and OpenSSL installed on your system. HDF 1.x or Apache NiFi 0.x Secured UI: HDF 2.x or Apache NiFi 1.x Secured UI: Creating
your Certificate Authority: You only need to create one CA, which you will use to sign the keys for every one of your servers/VMs and
users (You only need to create keys for users if your NiFi has not been
configured to use LDAP authentication).
What is a CA? The CA acts as a
trusted entity for validating the authenticity of certificates. The CA is used to certify the authenticity of the keys (server and user) you create and should be carefully protected. User should read the following wiki on CAs for a more detailed description: https://en.wikipedia.org/wiki/Certificate_authority Commands for creating a CA: *** Users should use strong passwords whenever prompted. When working with Java keystores, it is recommended that both the key password and the keystore password match. *** NOTE: Security requirements are more stringent as newer versions of browsers and NiFi are being used since this article was originally written. The below command should be changed to use "-aes256". ***You must type 'yes' to trust this certificate. The following command can be used to do a verbose listing of the contents of the above created keystore: keytool -v -list -keystore truststore.jks At the end of the above you will have your
"truststore" file (truststore.jks) that you will use in your
nifi.properties file. Use this same "truststore" file on every one of
your servers/VMs. You may also choose to load the rootCA.der or rootCA.pem key into
your browser as another authority. This is not required, but without this
authority loaded you will need to add a certificate exception when you try to
access the NiFi https URL. Edit the following lines in your nifi.properties file: nifi.security.truststore=/<path to certs>/truststore.jks nifi.security.truststoreType=JKS nifi.security.truststorePasswd=<MyTruststorePassord> nifi.security.needClientAuth=true
Creating
your Server Keystore: Now lets create a server/vm key and get it signed
by that CA: *** Users should use strong passwords whenever prompted. When working with Java keystores, it is recommended that both the key password and the keystore password match. The following procedure will [1] create your server/VMs private key, [2] Generate a Certificate Signing Request (.csr), [3] Use CSR to get your key signed by your CA using the CAs private key, [4] Import the public key for your CA in to your keystore, and [5] Import your signed certificate (.crt) in to your keystore to form the complete trusted chain.
At the end of the above you will have your
"keystore" file (nifi-server1.jks) that you will use in your
nifi.properties file for one of your servers/VMs. You will need to repeat the
above steps for each of your other servers/VMs so they each use their own keys.
Now keep in mind that I am using “nifi-server1" in this example, but
you will most likely use your systems/VMs hostnames (shortname as alias and
FQDN as CN). I also highly recommend that you use the same key and keystore
password for every key you create if creating keys for multiple nodes in a NiFi
cluster. The following lines need to be edited in the nifi.properties file: nifi.security.keystore=/<path to your certs>/nifi-server1.jks nifi.security.keystoreType=JKS nifi.security.keystorePasswd=<yourkeystorePassword> nifi.security.keyPasswd=<yourKeyPassword> Also make sure that you set the following property
in the nifi.properties file to true: nifi.security.needClientAuth=true Additional configurations for NiFi clusters
only: When working with a NiFi cluster, it is recommended
that you change the default NiFi user authority provider. The default is
file-provider. On your NCM you should change file-provider to
cluster-ncm-provider and on your nodes file-provider should be changed to
cluster-node-provider. nifi.security.user.authority.provider= You will also need to edit the authority-providers.xml
file to configure both of these new providers.
Remove the comments ( “<!--“ and “-->” ) surrounding
the section of XML associated to the provider you are enabling: Example NCM provider configuration: Example Node provider configuration:
Creating
User Keys for key based authentication: Now that you have all the keys you need for the
systems in your cluster, you will need to create some keys for your users to
load into their web browsers in order to securely access your NiFi. This step is not necessary if you have setup
your NiFi to use LDAP for user authentication. This is done in much of the same
way as you created your server keys: *** Users should use strong passwords whenever prompted.
Now you have a p12 file for user1, they can load this
in to their browser certs to use to authenticate against your secure
NiFi. Import your <user1>.p12 file in to your certificates for your
preferred browser. --------- HDF 1.x or Apache NIFi 0.x only: Now remember you must manually add that first
"ROLE_ADMIN" user to the authorized-users.xml file. So you will need
the DN from the user key you created for this Admin user and add it in to your
Authorized-users.xml file. --------- HDF 2.x or Apache NiFi 1.x only: You must configure your "Initial Admin Identity" in the authorizers.xml file. That Initial Admin Identity value must match the user's DN from the .p12 file exactly. --------- Here is an example of what it may look like: dn="EMAILADDRESS=none@none.com, CN=<user1>, OU=NiFi, O=someplace, L=Baltimore, ST=Maryland, C=US" Troubleshooting authentication issues: If you have the DN format wrong in your
authorized-users.xml file, rather then gaining access to the NiFi you will get
prompted to "request access”. Do not click the request
access link. You must instead go fix the DN in the authorized-users.xml file.
You need to create that first admin account that can approve those
requests. If you click request access, you will need to stop your NiFi
and delete the nifi-users.h2.db file (located inside the database_repository
directory), otherwise, even fixing your authorized-usesr.xml file will not gain
you access because your account will be stuck in a pending auth state. You can look at the request that came in in the
nifi-users.log to get the exact DN pattern to fix your authorized-usesr.xml
file entry: You should see something that looks like
this: INFO [NiFi Web Server-58023]
o.a.n.w.s.x509.X509AuthenticationFilter Attempting request for (<CN=JohnDoe, OU=MyBusiness, O=MyOrg, L=Baltimore, ST=MD, C=US>)
GET... That log line gives you the exact format of the DN
that needs to be updated/added to the authorized-users.xml file. Example
below: <user dn="CN=John Doe, OU=MyBusiness, O=MyOrg, L=Baltimore,
ST=MD, C=US">
<role name="ROLE_DFM"/>
<role name="ROLE_ADMIN"/>
<role
name="ROLE_PROVENANCE"/>
</user>
... View more
Labels:
02-11-2016
09:36 PM
19 Kudos
The purpose of this article is to explain what Process Groups
and Remote Process Groups (RPGs) are and how input and output ports are used to
move FlowFiles between them. Process groups are a valuable addition to any
complex dataflow. They give DataFlow Managers (DFMs) the ability to group a set
of processors on to their own imbedded canvas. Remote Process groups allow a
DFM to treat another NiFi instance or cluster as just another process group in
the larger dataflow picture. Simply being able to build flows on different
canvases is nice, but what if I need to move NiFi FlowFiles between these
canvases? This is where input and output ports come in to play. They allow
you move FlowFiles between these canvases that are either local to a single
NiFi or between the canvases of complete different NiFi instances/clusters.
Embedded Process Groups:
Lets start by talking about the simplest use of multiple embedded canvases
through process groups. When you started NiFi for the very first time you are
given a blank canvas. This blank canvas is noting more then a process group in
itself. The process group is referred to
as the root process group.
From there you are able to add additional process groups to that top-level
canvas. These added process groups allow you drill down in to them giving
additional blank canvases you could build dataflows on. When you enter a
process group you will see the hierarchy represented just above the canvas in
the UI ( NiFi Flow >>
Process Group 1 ). NiFi does not restrict the number of process
groups you can create or the depth you can go with them. You could compare the
process group hierarchy to that of a Windows directory structure. So if you
added another process group inside one that you already created, you would
essentially now have gone two layers deep. (
NiFi Flow >> Process Group 1
>> Process Group 2 ).
The hierarchy represented above you canvas allows you to quickly jump up one or
more layers all the way to the root level by simply clicking on the name of the
process group. While you can add any number of process groups at the same
embedded level, the hierarchy is only shown from root down to the current
process group you are in.
Now that we understand how to add embedded process groups, lets talk about how
we move data in and out of these process groups. This is where input and output
ports come in to play. Input and output ports exist to move FlowFIles between a
process group andONE LEVEL UPfrom that process group. Input
ports will accept FlowFiles coming from one level up and output ports allow
FlowFiles to be sent one level up. If I have a process group added to my
canvas, I cannot drag a connection to it until at least one input port exists
inside that process group. I also cannot drag a connection off of that process
group until at least on output port exists inside the process group. You can only
move FlowFiles up or down one level at a time. Given the example of a process
group within another process group, FlowFiles would need to be moved from the
deepest level up to the middle layer before finally being able to be moved to
the root canvas. In the above example I have a small flow pushing FlowFiles into an embedded
process group (Process Group 1) and also pulling data from the same embedded
process group. As you can see, I have
created an input and output port inside Process Group 1. This allowed me to
draw a connection to and from the process group on the root canvas layer. You
can have as many different input and output ports inside any process group as
you like. When you draw the connection
to a process group, you will be able to select which input port to send the
FlowFiles to. When you draw a connection from a process group to another
processor, you will be able to pick which output port to pull FlowFiles from. Every input and output port within a single process group must
have a unique name. NiFi validates the port name to prevent this from
happening. Remote Process Groups: We refer to the ability to send FlowFiles between different NiFi
instances as Site-to-Site. Site-to-Site is configured very much in the same way
we just configured moving files between embedded process groups on a single
NiFi instance. Instead of moving FlowFiles between different process groups
(layers) within the same NiFi, we are moving FlowFiles between different NiFi
instances or clusters. If a DFM reaches a point in their dataflow where they
want to send data to another NiFi instance or cluster, they would add a Remote
Process Group (RPG). These Remote Process Groups are not configured with unique
system port numbers, but instead all utilize the same Site-to-Site port number
configured in your nifi.properties files. I will not be covering the specific
NiFi configuration needed to enable site-to-site in this article. For information on
how to enable and configure Site-to-Site on a NiFi instance, see the Site-to-Site
Properties section of the Admin Guide. Lets take a quick look at how these two components differ: As I explained earlier, input and output ports are used to move FlowFiles
one level up from the process group they are created in. At the top level of your
canvas (root process group level) adding input or output ports provides the
ability for that NiFi to receive (input port) FlowFiles from another NiFi
instance or have another NiFi pull files from (output port) that NiFi. We refer
to input and output ports added the top level as remote input or output ports. While
the same input and output icon in the UI is used to add both remote and
embedded input and output ports, you will notice that they are rendered
differently when added to the canvas. If your NiFi has been configured to be secure (HTTPS) using
server certificates, the remote input/output port’s configuration windows will
have an “Access Control” tab where you must authorize which remote NiFI systems
are allowed to see and access these ports. If not running secure, all remote
ports are exposed and accessible by any other NiFi instance.
In single instance you can send data to an input port inside a process group by
dragging a connection to the process group and selecting the name of the input
port from a selection menu provided. Provided that the remote NiFi instance has
input ports exposed to your NiFi instance, you can drag a connection to the RPG
much in the same way you previously dragged a connection to the embedded
process groups within a single instance of NiFi. You can also hover over the
RPG and drag a connection off of the RPG, which will allow you to pull data
from an available output port on the target NiFi. The Source NiFi (standalone or cluster) can have as many RPGs as
a DFM would like. You can have multiple RPGs in different areas of your
dataflows that all connect to the same remote instance. While the target NiFi
contains the input and output ports (Only Input and output ports added to root
level process group can be used for Site-to-Site Flowfile transfers). When sending data between two standalone NiFi instance the setup
of your RPG is fairly straight forward. When adding the RPG, simply provide the
URL for the target instance. The source RPG will communicate with the URL to
get the Site-to-Site port to use for FlowFile transfer. When sending FlowFiles via Site-to-Site to a
NiFi that is a NiFi cluster we want the data going to every node in the
cluster. The Site-to-Site protocol handles this for you with some additional load-balancing
benefits built in. The RPG is added and configured to point at the URL of the
NCM. (1)The NCM will respond with the Site-to-Site port for the NCM. (2) The
source will connect to the Site-to-Site port of the NCM which will respond to
the source NiFi with the URLs, Site-to-Site port numbers, and current loads on
every connected node. (3) The source NiFi will then load-balance FlowFile
delivery to each of those nodes giving fewer FlowFiles to nodes that are under
heavier load. The following diagram
illustrates these three steps: A DFM may choose to use Site-to-Site
to redistribute data arriving on a single node in a cluster to every node in
that same cluster by adding a RPG that points back at the NCM for that cluster.
In this case the source NiFi instance is also one of the target NiFi instances.
... View more
Labels:
02-09-2016
01:05 PM
4 Kudos
NiFi supports compression which can decrease the size of files being transferred across the network. NiFi can split large files in to smaller files which can be reassembled back in to the original larger files by a NiFi on the other side of the transfer. Those split files could be sent via multiple concurrent threads. If network issue occurs, entire file transfer does not start over, just that one small piece. NiFi could be used to remove unneeded portions of the content that does not need to be transferred (think system logs where some log lines have no value. Those log lines could be removed from the the larger log file reducing it size before being transferred).
... View more
- « Previous
- Next »