About MattWho

MattWho · ‎03-22-2016

This ERROR messages is informing you that the configured buffer in your putKafka processor was not large enough to accommodate the batch of files it wanted to transfer to Kafka. So the log above shows that a batch of 3 files was created, 2 of the files from that batch transferred successfully, and 1 file was routed to the putKafka's failure relationship. The total size of the batch was recorded as 4294967296 (4GB). These are very large files for Kafka... The Failure relationship should be looped back on to the putKafka processor so after a short penalization, the failed file will get re-transmitted. There are 4 settings at play here in the putKafka processor you will want to play around with. Max Buffer Size: <-- max amount of reserved buffer space Max Record Size: <-- max size of any one record Batch Size: <-- max number of records to batch Queue Buffering Max Time: <--- max amount of time spent on batching before transmitting. *** The batch will be transmitted when either the Batch Size is satisfied or Queue Buffering Max time is reached. Considering the size of the messages you are trying to send to your Kafka topic, I would recommend the following settings: Max Buffer Size: 2 GB Max Record Size: 2 GB Batch Size: 1 Queue Buffering Max Time: 100 ms Since you will be sending one file at a time, you may want to increase the number of Concurrent Tasks configured on the "Scheduling" tab of the putKafka processor. Only do this if the processor can not keep up with the flow of data. So start with the default of 1 and increase by only 1 at a time if needed. Keep in mind that the buffered records live in your JVM heap, so the the more concurrent tasks and the larger the Max Buffer Size configuration, the more heap this processor will use. Thanks, Matt

MattWho · ‎03-16-2016

The NCM in a NiFi cluster typically needs more heap memory. The number of components (processors, input ports, output ports and relationships) x the number of nodes in the NiFi cluster on the graph will drive how much memory your NCM will need. For ~300 - 400 components and 3 - 4 node cluster, the NCM seems pretty good with 8GB of heap. If you encounter heap issue still, you would need to increase the heap size and/or reduce the stat buffer size and/or frequency in the nifi.properties files (NCM and Nodes). nifi.components.status.repository.buffer.size=360 (defaults is 1440) nifi.components.status.snapshot.frequency=5 min (default is 1) This information is accurate as of NiFi 0.5.1 and HDF 1.1.2.

MattWho · ‎03-15-2016

for you scenario with 12 disks (assuming all disk are 200 GB) You can specify/define multiple Content repos and multiple Provenance repos; however, you can only define one FlowFile repository and one database repository. - 8 disks for Content repos: - /cont_repo1 <-- 200 GB - /cont_repo2 <-- 200 GB - /cont_repo3 <-- 200 GB - /cont_repo4 <-- 200 GB - /cont_repo5 <-- 200 GB - /cont_repo6 <-- 200 GB - /cont_repo7 <-- 200 GB - /cont_repo8 <-- 200 GB - 2 disks for Provenance repos: - /prov_repo1 <-- 200 GB - /prov_repo2 <-- 200 GB - 1 disk split into multiple partitions for: - /var/log/nifi-logs/ <-- 100 GB - OS partitions <-- split amongst other Standard OS (/tmp, /, etc...) - 1 disk split into multiple partitions for: - /opt/nifi <-- 50 GB - /flowfile_repo/ <-- 50 GB - /database_repo/ <-- 25 GB - /opt/configuration-resources <-- 25 GB (this will hold any certs, config files, extras your NiFi processors/ dataflows may need).

MattWho · ‎03-15-2016

There is no direct correlation between the size of the content repository and the provenance repository. The size the content repository will grow to is directly tied to the amount of unique content that is currently queued on the NiFi canvas. If archive is enabled the amount of content repository space consumed will depend on the archive configuration settings in the nifi.properties file. nifi.content.repository.archive.max.retention.period=12 hours nifi.content.repository.archive.max.usage.percentage=75% nifi.content.repository.archive.enabled=true As you can see from the above archive will try to retain 12 hours of archived content (archived content being content that is no longer associated to an existing queued FlowFile on within any dataflow on the graph. This does not guarantee that there will be any archive or that the content repository will not grow beyond 75% disk utilization. Content still actively associated to queued FlowFiles will remain in the Content repository. So it is important to build in back pressure in to dataflows where there is concern that large backlogs could trigger disk to fill to 100%. Should Content repo fill to 100% corruption will not occur. New FlowFiles will not be able to be created until free space is available. This is likely to produce a lot of errors in the flow (anywhere content is modified/written). Provenance repository size is directly related to the number of FlowFiles and the number of event generating processors those events pass through on the NiFi canvas. In the case of disk utilization here, it is very controlled by setting in the nifi.properties file: nifi.provenance.repository.max.storage.time=7 days nifi.provenance.repository.max.storage.size=50 GB With the above settings, NiFi will try to retain 7 days of provenance events on every FlowFile that it processes, but will start rolling off the oldest events once the max storage exceeds 50 GB. It is important to understand that the 75% and 50GB are soft limits and should never be set to 100% or the exact size of the disk. FlowFile Repository and database repository each remain relatively small. The FlowFile repository is the most important repo if all. It should be isolated on a separate disk/partition that is not shared with any other process that may fill it. allowing the FlowFile repository disk to fill to 100% can lead to database corruption and lost data. for a 200 GB Content repository, a ~25 GB FlowFile repo should be enough. The database repository contains the user and change history DBs. The user db will remain 0 bytes in size for NiFi instances running http (non-secure). For those instances running https (Secure), the user db will track all users who log in to the UI. The change history db is tied to the little clock icon in the upper right corner NiFi tool bar. It keeps track of all changes made on the NiFi graph/canvas. It also stays relatively small. A few GB of space should be plenty to store a considerable number of changes.

MattWho · ‎03-15-2016

@Lubin Lemarchand you are correct. Thank you for filling in the details.

MattWho · ‎03-14-2016

Here is a basic sizing chart for HDF: *** But you must keep in mind that these requirements may grow depending on what processors you use in your dataflow. Memory need is often one that grows quicker then CPU need. *** Also understand that these sizing scenarios are based upon setting up your NiFi instance(s) per the best practice documentation provided.

MattWho · ‎03-14-2016

Shishir, I agree that you should be carefully reviewing all the documented links provided by Artem Ervits, but you also need to understand the loading behavior of any given NiFI instance is directly tied to what processors are being used. While some processors exhibit little impact to CPU and/or memory, others can impact those things significantly. Capacity planning needs to take in to consideration the dataflows you want to run. What kind of data content manipulation you want to do (MergeContent, SplitContent, ReplaceContent, etc...), data sizes and volumes, how many NiFi nodes and how you plan to distributed the data load, etc...

MattWho · ‎03-09-2016

I am assuming you are using the InvokeHTTP processor and that you want to use one of the new attributes created on your FlowFile in response to the request for adding to the content of the same Flowfile. You will want to make sure you have the "Put Response Body in Attribute" property configured in the InvokeHTTP processor. You can then use the ReplaceText processor with an Evaluation Mode of Entire text and Replacement Strategy of Append. This will allow you to write a NiFi Expression Language statement that uses the attribute you specified for the response body containing the return and append it to your original json content.

MattWho · ‎02-16-2016

@cokorda putra susila NiFi already includes the HDFS core libraries. So no need to install Hadoop on the NiFi server. Just need to the config files (i.e - core-site.xml) as Artem suggests.

MattWho · ‎02-16-2016

The purpose of this article is to provide the steps needed to create your own certificates for securing your NiFi instance(s). The article will also cover creating your own Certificate Authority (CA) that you can use to sign all the certificates you create. This article is not intended to be a best practices guide to creating secure keys. While we will provides tips, users should carefully research the various security options available when creating keys. This procedure assumes you have Java Keytool and OpenSSL installed on your system. HDF 1.x or Apache NiFi 0.x Secured UI: HDF 2.x or Apache NiFi 1.x Secured UI: Creating your Certificate Authority: You only need to create one CA, which you will use to sign the keys for every one of your servers/VMs and users (You only need to create keys for users if your NiFi has not been configured to use LDAP authentication). What is a CA? The CA acts as a trusted entity for validating the authenticity of certificates. The CA is used to certify the authenticity of the keys (server and user) you create and should be carefully protected. User should read the following wiki on CAs for a more detailed description: https://en.wikipedia.org/wiki/Certificate_authority Commands for creating a CA: *** Users should use strong passwords whenever prompted. When working with Java keystores, it is recommended that both the key password and the keystore password match. *** NOTE: Security requirements are more stringent as newer versions of browsers and NiFi are being used since this article was originally written. The below command should be changed to use "-aes256". ***You must type 'yes' to trust this certificate. The following command can be used to do a verbose listing of the contents of the above created keystore: keytool -v -list -keystore truststore.jks At the end of the above you will have your "truststore" file (truststore.jks) that you will use in your nifi.properties file. Use this same "truststore" file on every one of your servers/VMs. You may also choose to load the rootCA.der or rootCA.pem key into your browser as another authority. This is not required, but without this authority loaded you will need to add a certificate exception when you try to access the NiFi https URL. Edit the following lines in your nifi.properties file: nifi.security.truststore=/<path to certs>/truststore.jks nifi.security.truststoreType=JKS nifi.security.truststorePasswd=<MyTruststorePassord> nifi.security.needClientAuth=true Creating your Server Keystore: Now lets create a server/vm key and get it signed by that CA: *** Users should use strong passwords whenever prompted. When working with Java keystores, it is recommended that both the key password and the keystore password match. The following procedure will [1] create your server/VMs private key, [2] Generate a Certificate Signing Request (.csr), [3] Use CSR to get your key signed by your CA using the CAs private key, [4] Import the public key for your CA in to your keystore, and [5] Import your signed certificate (.crt) in to your keystore to form the complete trusted chain. At the end of the above you will have your "keystore" file (nifi-server1.jks) that you will use in your nifi.properties file for one of your servers/VMs. You will need to repeat the above steps for each of your other servers/VMs so they each use their own keys. Now keep in mind that I am using “nifi-server1" in this example, but you will most likely use your systems/VMs hostnames (shortname as alias and FQDN as CN). I also highly recommend that you use the same key and keystore password for every key you create if creating keys for multiple nodes in a NiFi cluster. The following lines need to be edited in the nifi.properties file: nifi.security.keystore=/<path to your certs>/nifi-server1.jks nifi.security.keystoreType=JKS nifi.security.keystorePasswd=<yourkeystorePassword> nifi.security.keyPasswd=<yourKeyPassword> Also make sure that you set the following property in the nifi.properties file to true: nifi.security.needClientAuth=true Additional configurations for NiFi clusters only: When working with a NiFi cluster, it is recommended that you change the default NiFi user authority provider. The default is file-provider. On your NCM you should change file-provider to cluster-ncm-provider and on your nodes file-provider should be changed to cluster-node-provider. nifi.security.user.authority.provider= You will also need to edit the authority-providers.xml file to configure both of these new providers. Remove the comments ( “” ) surrounding the section of XML associated to the provider you are enabling: Example NCM provider configuration: Example Node provider configuration: Creating User Keys for key based authentication: Now that you have all the keys you need for the systems in your cluster, you will need to create some keys for your users to load into their web browsers in order to securely access your NiFi. This step is not necessary if you have setup your NiFi to use LDAP for user authentication. This is done in much of the same way as you created your server keys: *** Users should use strong passwords whenever prompted. Now you have a p12 file for user1, they can load this in to their browser certs to use to authenticate against your secure NiFi. Import your <user1>.p12 file in to your certificates for your preferred browser. --------- HDF 1.x or Apache NIFi 0.x only: Now remember you must manually add that first "ROLE_ADMIN" user to the authorized-users.xml file. So you will need the DN from the user key you created for this Admin user and add it in to your Authorized-users.xml file. --------- HDF 2.x or Apache NiFi 1.x only: You must configure your "Initial Admin Identity" in the authorizers.xml file. That Initial Admin Identity value must match the user's DN from the .p12 file exactly. --------- Here is an example of what it may look like: dn="EMAILADDRESS=none@none.com, CN=<user1>, OU=NiFi, O=someplace, L=Baltimore, ST=Maryland, C=US" Troubleshooting authentication issues: If you have the DN format wrong in your authorized-users.xml file, rather then gaining access to the NiFi you will get prompted to "request access”. Do not click the request access link. You must instead go fix the DN in the authorized-users.xml file. You need to create that first admin account that can approve those requests. If you click request access, you will need to stop your NiFi and delete the nifi-users.h2.db file (located inside the database_repository directory), otherwise, even fixing your authorized-usesr.xml file will not gain you access because your account will be stuck in a pending auth state. You can look at the request that came in in the nifi-users.log to get the exact DN pattern to fix your authorized-usesr.xml file entry: You should see something that looks like this: INFO [NiFi Web Server-58023] o.a.n.w.s.x509.X509AuthenticationFilter Attempting request for (<CN=JohnDoe, OU=MyBusiness, O=MyOrg, L=Baltimore, ST=MD, C=US>) GET... That log line gives you the exact format of the DN that needs to be updated/added to the authorized-users.xml file. Example below: <user dn="CN=John Doe, OU=MyBusiness, O=MyOrg, L=Baltimore, ST=MD, C=US"> <role name="ROLE_DFM"/> <role name="ROLE_ADMIN"/> <role name="ROLE_PROVENANCE"/> </user>

Online	Offline
Last Visited	‎10-21-2025 10:38 AM

Member Since	‎07-30-2019 10:41 AM
Last Visited	‎10-21-2025 10:38 AM
Posts	3,373
Kudos received	1612

Cloudera Community

Re: using nifi as a kafka streaming- real-time str...

Re: using nifi as a kafka streaming- real-time str...

Re: Nifi Registry and LDAP

Re: NiFi logs not rolling over on Windows

Re: Nifi Registry and LDAP

Re: Nifi - PutKakfa Error

Re: Capacity planning for NiFi cluster

Re: NiFi Repository - Typical Disk Usage Ratios am...

Re: NiFi Repository - Typical Disk Usage Ratios am...

Re: issue merging content in nifi

Re: Capacity planning for NiFi cluster

Re: Capacity planning for NiFi cluster

Re: issue merging content in nifi

Re: HDF apache nifi best practice

How to create user generated keys for securing NiF...