Member since
07-30-2019
1949
Posts
1161
Kudos Received
536
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
117 | 02-10-2021 10:27 AM | |
131 | 02-10-2021 09:03 AM | |
192 | 02-05-2021 08:09 AM | |
129 | 02-02-2021 07:12 AM | |
129 | 02-02-2021 06:52 AM |
06-06-2016
11:59 AM
Shashi, You have run.as=ec2-user. What user are you logged in as when you execute teh nifi.sh script? (ec2-user, root, shahshi, etc....)
Thanks, Matt
... View more
06-03-2016
05:33 PM
You can edit files as root. Editing files does not change ownership. You just need to make sure at the end of editing all files are owned by the user who will be running yoUR NiFi instances.
Give yourself a fresh start and delete the flow.tar on your NCM and flow.xml.gz and templates dir on your Node. So at the end of configuring your two NiFi installs (one install configured to be NCM and one separate install configured to be a Node), you started your NCM successfully? Looking in the nifi-app.log for your NCM, do you see the following lines: 2016-06-03 ... INFO [main] org.apache.nifi.web.server.JettyServer NiFi has started. The UI is available at the following URLs:
2016-06-03 ... INFO [main] org.apache.nifi.web.server.JettyServer https://Bxxxxx.xxxxxx.com:8080/nifi You then go to your other NiFi installation configured as your Node and start it.
After it has started successfully it will start attempting to send heartbeats to Bxxxxx.xxxxxxx.com on port 1xxx. You should see these incoming heartbeats logged in the nifi-app.log on your NCM. Do you see these? INFO [Process NCM Request-1] o.a.n.c.p.impl.SocketProtocolListener Received request 411684b2-25cb-461f-978e-fb3bda6a7ef0 from Axxxxx.xxxxxx.com INFO [Process NCM Request-1] o.a.n.c.manager.impl.WebClusterManager Node Event: (......) 'Connection requested from new node. Setting status to connecting.' After that the NCM will either mark the node as connected or given a reason for not allowing it to connect
If you are not seeing these heartbeats in the NCM nifi-app.log, then something is blocking the TCP traffic on the specified port. I did notice in the above example you provided 1xxx as your cluster manger port. Is that port above 1024? Ports <= 1024 are reserved and can't be used by non root users. If you are running your NCM as a user other then root (as it sounds by the above) NiFi will fail to bind to that port for listening for these heartbeats. Matt
... View more
06-03-2016
04:13 PM
1 Kudo
A fresh install of NiFi has no flow.xml.gz file until after it is started for the first time.
Are these fresh NiFi installs or installations that were previously run standalone? - if that is the case you can't simply tell them they are nodes and NCMs and expect it to work. Your NCM does not run with a flow.xml.gz like your nodes and standalone instances do. The NCM uses a flow.tar file. The flow.tar would be created on startup and contain an empty flow.xml. When you started your Node (with existing flow.xml.gz file) it would have communicated with NCM but been rejected because the flow on the node would not have matched what was on the NCM. If you are looking to migrate form a standalone instance to a cluster, I would suggest reading this:
https://community.hortonworks.com/content/kbentry/9203/how-to-migrate-a-standalone-nifi-into-a-nifi-clust.html Let me make sure understand your environment:
1. you have two different installation of NiFi. 2. One installation of NiFi is setup and configured to be a non-secure (http) NCM 3. One instance of NiFi is setup and configured to be a non-secure (http) Node. 4. The # cluster common properties (cluster manager and nodes must have same values) # section in the nifi.properties files on both NCM and Node(s) are configured identical 5. In that section on both nifi.cluster.protocol.is.secure=false is configured as false (Cannot be true if running http.) 6. The # cluster node properties (only configure for cluster nodes) # has been configured only on your node. - The following properties in the above node section are configured nifi.cluster.is.node=true nifi.cluster.node.unicast.manager.address= nifi.cluster.node.unicast.manager.protocol.port= and the port matched what you configured in the next section in your NCM. 8. The # cluster manager properties (only configure for cluster manager) # section has been configured on your NCM only. - nifi.cluster.is.manager=true Thanks, Matt
... View more
06-03-2016
03:38 PM
Are these https or http configured cluster NCM and Node(s)?
NCM needs to be able to communicate with the http(s) port and node.protocol port configured in the nifi .properties file on the Node(s).
Node needs to be able to communicate with the cluster manager protocol port configured in the nifi.properties file on the NCM.
Thanks, Matt
... View more
06-03-2016
12:03 PM
What version of NiFi are you using?
... View more
06-03-2016
12:00 PM
Hello Shashi, The ec2-user will not need to sudo priveleges to run NiFi, but you will need to make sure that all the NiFi directories and repositories are readable and editable by this user. The error above is most likely caused by the fact that the ec2-user could not read/extract the classes out of the nifi lib directory and in to a work directory it creates within the NiFi base install path by default.
Thanks, Matt
... View more
06-02-2016
01:18 PM
1 Kudo
There are a few things you can do here if i am understanding correctly what you are trying to accomplish. 1. The logback.xml can be modified so specific processor component logs could be redirected to a specific new log file. You can specify where that new log is written. You could also specify the specific log level of those components (WARN level would get you just WARN and ERROR messages).
2. In your dataflow you could use the TailFile processor to monitor that new log and route any generated FlowFiles to a putEmail processor to send them to your Admin. In addition to email you can route those FlowFiles to a processor of your choice to put a copy to a specific location as well either locally or remotely. Thanks, Matt
... View more
06-01-2016
02:08 PM
PJ, I assume the message you are seeing is: "The specified run.as user does not exist. Exiting." What this indicates is that the run.as= field in the bootstrap.conf file is not empty but rather has a one or more spaces. ("run.as= " instead of "run.as=") Also make sure that when you configured it for user nifi that the run.as field was actually "run.as=nifi" and not "run.as=nifi ". Spaces are valid characters. There is an open Apache NiFi Jira that covers this bug: https://issues.apache.org/jira/browse/NIFI-915 Thanks, Matt
... View more
06-01-2016
01:27 AM
1 Kudo
Hello PJ, What version of NiFi are your running and did you create a local user "nifi" on each of your systems? Thanks, Matt
... View more
05-31-2016
02:58 PM
Ahmad, The line you are seeing in the nifi-bootstrap.log indicates the JVM started successfully. You need to check the nifi-app.log to make sure the application loaded successfully. In the nifi-app.log you will find the following lines if the application successfully loaded:
2016-05-31 10:46:44,347 INFO [main] org.apache.nifi.web.server.JettyServer NiFi has started. The UI is available at the following URLs: 2016-05-31 10:46:44,347 INFO [main] org.apache.nifi.web.server.JettyServer http://<someaddress or FQDN>:8088/nifi Verify that the hostname or IP displayed on this line is reachable/resolvable on the system you are running your web browser from.
Thanks, Matt
... View more
05-30-2016
11:07 PM
Thank you for your feedback. I will look into creating an article to cover this topic. Perhaps "How to setup my first non-secured NiFi cluster."
... View more
05-30-2016
11:04 PM
PJ, Roles only work with a secured NiFi. The intent of step 5 was to give a backdoor method for properly populating the first needed entry in the authorized-users.xml file. You can either follow step 5 (backdoor method for creating first needed Admin user for https setup) or you can get your DN from your cert or ldap to manually populate that authorized-users.xml.
Bottom line is if you are being told you cannot secure your NiFi and to leave it running unsecure with http, you cannot create user roles. User roles are only used by a secured https configured NiFi. If someone is telling you that you can setup user roles within a http configured NiFi, they are unfortunately misinformed.
An example entry is below should you decide to secure your NiFi so you can make use of this feature in the future:
<users> <userdn="CN=John Doe, OU=MyBusiness, O=MyOrg, L=Baltimore, ST=MD, C=US"> <rolename="ROLE_ADMIN"/> </user> </users> Keep in mind that in order for this to work, the userdn above has to match exactly how it is recorded in the users certificate or within ldap for user John Doe in this example. What this does is authorize the authenticated user "John Doe" with the "Admin" role. A user can be assigned multiple roles as well. An example of how that would look is as follows: <users> <userdn="CN=John Doe, OU=MyBusiness, O=MyOrg, L=Baltimore, ST=MD, C=US"> <rolename="ROLE_ADMIN"/> <rolename="ROLE_DFM"/> <rolename="ROLE_PROVENANCE"/> </user> </users> In that example, the authenticated user "John Doe" has been authorized with "Admin", "Dataflow Manager", and "Provenance" user roles. Hope this adds clarity to my previous response. Thanks, Matt
... View more
05-30-2016
02:26 PM
Hello, The getFile and putFile processors are used for retrieve or putting files to local disk. If the linux directory were mounted to a Windows drive letter, these processors could be used for what you are trying to do here. More commonly the listSFPT, fetchSFTP, and putSFTP processors are used for this task. Thanks, Matt
... View more
05-30-2016
02:22 PM
1 Kudo
Hello, Pierre is correct about what the swap threshold is used for. For speed and efficiency, NiFi hold all the FlowFile attributes associated with each FlowFile in JVM memory. In case where queue develop this can result in considerable memory usage. So NiFi has established a default swapping threshold of 20,000 FlowFiles per connection. What this means is that once a queue reaches 30,000 FlowFiles, 10,000 will be swapped out to disk. The 20,000 that are the next to be worked on based on the connection prioritization scheme are left in memory. NiFi will continue to swap 10,000 FlowFiles at a time to disk as the queue continues to grow. Keep in mind that files swapped out must be swapped back in before they can be worked on by the destination processor. NiFi does not throw away any data unless expiration has been set on connections. As long as you have sufficient disk space to hold the data, it will continue to queue. I suggest reading through this article if you have not already: ( https://community.hortonworks.com/content/kbentry/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.html ) That being said, there are dangers to allowing your disk to fill to 100% capacity, so as Pierre mentioned you should be setting backpressure throughout your dataflow to trigger upstream processor to stop and eventually stopping pulling in new data.
Thanks, Matt
... View more
05-30-2016
02:10 PM
1 Kudo
Hello, In order to secure access to your NiFi instance/cluster, NiFi must be configured to run securely via https. Section 6 of the guide linked above correctly states that your NiFi must be configured to run securely (HTTPS) and have an authentication mechanism (user certificates. ldap, or kerberos) in place. Without a secure setup, all users who access the NiFi UI get in with anonymous access which gives all of them full access so all aspects of the NiFi UI.
The DN is what uniquely identifies each user and/or server that accesses your NiFi. If using ssl certificates as your authentication mechanism, the DN will be found inside the certificate and would have been generated during the certificates creation phase. There is an article here ( https://community.hortonworks.com/articles/17293/how-to-create-user-generated-keys-for-securing-nif.html ) that walks you through creating your own keystores and truststores for securing your NiFi (https). It also covers creating user certificates if that is the authentication mechanism you choose to use.
You can also use ldap or kerberos as your authentication mechanism, but you will need to setup or use an existing ldap or kerberos infastructure.
Users with the "admin" role have the ability to authenticate in to the secured NiFi UI. From there they can access the user management interface via this icon . This interface will allow users that have the "admin" role to approve the access of other users who have requested through the secured NiFi UI. The reason you need to manually add the first "admin" user is because otherwise you have no users who can access the UI to approve requests.
If you are unsure how to extract your user DN from your configure authentication mechanism, you can do the following: 1. You still need to setup your NiFi securely. You can use the procedure linked above to create the needed keystores and truststores to do so. 2. Configure your nifi.properties file for secure (https) and non-secure (http) access. You will need to use unique ports for each. (8080 for http and 8443 for https for example). 3. Navigate to the https address for your NiFi instance. If you are using user certificates, you will need to have followed that section of the above linked article to create your user key and load it in to your browser. If setup for ldap, provide your ldap username and password when the NiFI UI prompts you. 4. You will then be prompted to request access if you successfully authenticated. This is the authorization request portion. after requesting access the screen will say pending approval. 5. You can now navigate to the non-secure (http) address for your NiFi which lets everyone in as anonymous with full access. Go to the user management UI via the icon shown above and grant your user the "admin" role. You can now go back to the secured NiFi UI address and gain controlled access. Don't forget to go back in your nifi.properties file and remove the http configure to prevent uncontrolled annoymous access at this point.
There are other roles that authorize authenticated users to do different things within the UI: 1. adminstrator: Can add or remove authorized roles for other users. Can purge flow configuration change history. 2. Data Flow Manager: Can build, manipulate, modify, start, stop, and/or delete dataflows on the NiFi canvas. 3. Read Only: Can access UI and view the configuration of items on the canvas, but cannot build, manipulate, modify, start, stop, and/or delete any of it. 4. Provenance: Users with this added role can search any stored provenance data. I am not sure where the confusion came from with regards to setting up controlled access via http, but if you can point me in the right direction I will do my best to get the documentation updated so it is more clear. Thank you, Matt
... View more
05-30-2016
01:18 PM
Are there any WARN/ERROR messages being produced in the nifi-app.log or nifi-bootstrap.log?
... View more
05-27-2016
12:53 PM
Keep in mind that FlowFile Attributes live in memory. Loading a FlowFile Attribute with the entire content of the file is going to have an impact on heap usage in your flow. That being said, there are two things to consider when building dataflows like this:
1. Increasing the the size of the available heap for the NiFi application. Heap space thresholds for NiFi are configured in the bootstrap.conf file and by default are very small (512 MB).
# JVM memory settings java.arg.2=-Xms512m java.arg.3=-Xmx512m 2. You must take in to consideration the data volumes you will be working with in the particular dataflow. To help prevent out of memory error in NiFi, we have established a threshold on how much data can queue on a connection before FlowFile's attributes are swapped out of heap to disk. The default configuration in the nifi.properties file is 20,000. ( nifi.queue.swap.threshold=20000 ) this is per connection not per flow. So if the FlowFiles you extracted content in begin to queue on numerous connections, you run the risk of hitting the out of memory condition quicker. You can decrease this value so swapping happens sooner, but that will in turn have an impact on performance. I would start with increasing the heap memory for your NiFi and the go from there.
... View more
05-27-2016
12:35 PM
While still possible to use multicast, it is very uncommon. Its original intend was so you could setup your NiFi cluster so it could auto-discover the NCM. The idea behind this was that if the NCM died, a new one could quickly be stood-up and the nodes would auto-discover and join that new NCM without needing to be restarted. This multicast setup has been around since clustering in NiFi was first added. This was long before site-to-site capability was added. With Site-to-Site, the ability use multicast for the intend described above is not possible without some unique setups within DNS. The RemoteProcessGroup is very dependent on a specific NCM URL, so having the URL change would break Site-To-Site.
... View more
05-25-2016
10:51 PM
The fact that it was started without any configuration modification will have only one impact. With default configuration, the NiFi instance would have started http as a standalone instance. As a result it would have generated a flow.xml.gz file and a templates directory inside the NiFi conf directory. If the cluster NCM you are joining this node to already has a existing flow or templates, this node will fail to join because they will not match. NO need to reinstall to fix this if that is the case. Simply delete the flow.xml.gz file and the templates directory before starting it again. When it joins the cluster it will get the current flow and templates from the NCM.
... View more
05-25-2016
10:17 PM
That state directory you found only exists because at some point you started your NiFi instance and it was generated by the application. Had this been a fresh install it would not have existed and you would have needed to create yourself to complete the zookeeper setup.
... View more
05-25-2016
10:14 PM
1 Kudo
Yes you can use that state directory and just create the zookeeper sub directory in which you will have the myid file. I do recommend that your state directory is instead created somewhere outside of the base NiFi install path. This can aid in simplifying future upgrades of NiFi. Since newer version will still want to reference the existing cluster wide state created in your existing NiFi version. If you do choose to move it form default, update the zookeeper properties file and create the new path.
... View more
05-17-2016
04:06 PM
2 Kudos
Is that the entire log message? Can you share the preceding lines to this stack trace? Marco,
The NoClassDefFoundError you have encountered is most likely caused by the contents of your core-sites.xml file. Check to see if the following line exists and if it does remove it from the file:
“com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec” from “io.compression.codecs” property in “core-site.xml” file. Thanks, Matt
... View more
04-28-2016
09:36 PM
2 Kudos
Understanding your flow will help us understand what is going on.
1. Are you creating a zero byte file that you are using as the trigger for your InvokeHTTP processor?
2. How do you have the invokeHTTP processor configured? (Is it set to Put Response Body In Attribute?)
If Put Response Body In Attribute is set to an attribute value, the content of the Flowfile on the "original" relationship will still have a zero byte content size. NiFi does not support the replay of flowfiles that are zero bytes in size. (A Jira is being entered for this as i see replay of zero byte file scan have a valid use case at times)
If you did not configure "Put Response Body In Attribute" property, a new FlowFile would have been generated where the response becomes the content and the FlowFile is routed to the "response" relationship. NiFi cannot replay files a creation time in the flow. The way replay works, Flowfiles are reinserted on the connection feeding the processor that produced the event. In cases where the processor producing the event actually created the Flowfile, there is no where to reinsert that claim for replay. You should however be able to replay that file at the next processor that produced an provenance event.
If that replay messgae is generated at a later in line processing event, it indicates that the content no longer exist in the content repos archive. Typically this is because the retention duration configured in the nifi.properties file has been exceeded for this content, but it could also be caused by other factors such as Content repo has exceeded the configured allowable disk utilization threshold percentage (also configured in nifi.properties file) or the content was manually deleted from repo (less likely). Queued active data in the flow takes precedence over archive data retention, so if you have a lot of queued data in your flow, you may not have an archived data at all because of the max disk utilization percentage configured for your NiFi.
... View more
04-26-2016
09:21 PM
There are additional items that will need to be taken in to consideration if you are running a NiFi cluster. See the following for more details:
https://community.hortonworks.com/content/kbentry/28180/how-to-configure-hdf-12-to-send-to-and-get-data-fr.html
... View more
04-26-2016
07:28 PM
Can you provide a little more detail on your use case? Where will the URLs you want to use originate from?
... View more
04-18-2016
09:28 PM
4 Kudos
Setting up Hortonworks Dataflow (HDF) to work with kerberized Kafka in Hortonworks Data Platform (HDP) HDF 1.2 does not contain the same Kafka client libraries as the Apache NiFi version. HDF Kafka libraries are specifically designed to work with the Kafka versions supplied with HDP. The following Kafka support matrix breaks down what is supported in each Kafka version: *** (Apache) refers to the Kafka version downloadable from the Apache website. For newer versions of HDF (1.1.2+), NiFi uses
zookeeper to maintain cluster wide state. So the following only applies if this
is a HDF NiFi cluster: 1. If a NiFi cluster has been setup to use a
kerberized external or internal zookeeper for state, every kerberized
connection to any other zookeeper would require using the same keytab and
principal. For example a kerberized embedded zookeeper in NiFi would need
to be configured to use the same client keytab and principal you want to use to
authenticate with a say a Kafka zookeeper. 2. If a NiFi cluster has been setup to use a
non-kerberized zookeeper for state, it cannot then talk to any other zookeeper
that does use kerberos. 3. If a NiFi cluster has been setup to use a kerberized
zookeeper for state, it cannot then communicate with any other non-kerberized
zookeeper. With that being said,
the PutKafka and GetKafka processors do not have properties like the HDFS
processors for keytab and principal. The keytab and principal would be
defined in the same jaas file used if you setup HDF cluster state management.
So before even trying to connect to kerberized Kafka, we need to get NiFi
state management configured to use either an embedded or external kerberized
zookeeper for state. Even if you are not clustered right now, you need to take
the above in to consideration if you plan on upgrading to being a cluster
later: —————————————— NiFi Cluster Kerberized State Management: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#state_management Lets assume
you followed the above linked procedure to setup your NiFi cluster to create an
embedded zookeeper. At the end of the above procedure you will have made
the following config changes on each of your NiFi Nodes: 1. Created a zookeeper-jaas.conf file On nodes with embedded zookeeper, it will contain
something like this: Server
{ com.sun.security.auth.module.Krb5LoginModule
required useKeyTab=true
keyTab="./conf/zookeeper-server.keytab" storeKey=true useTicketCache=false principal="zookeeper/myHost.example.com@EXAMPLE.COM"; }; Client
{ com.sun.security.auth.module.Krb5LoginModule
required useKeyTab=true keyTab="./conf/nifi.keytab" storeKey=true useTicketCache=false principal="nifi@EXAMPLE.COM"; }; On Nodes without embedded zookeeper, it will look
something like this: Client
{ com.sun.security.auth.module.Krb5LoginModule
required useKeyTab=true keyTab="./conf/nifi.keytab" storeKey=true useTicketCache=false principal="nifi@EXAMPLE.COM"; };
2. Added a config line to the NiFi
bootstrap.conf file: java.arg.15=-Djava.security.auth.login.config=/<path>/zookeeper-jaas.conf
*** the arg number (15 in this case) must
be unused by any other java.arg line in the bootstrap.conf file 3. Added 3 additional properties to the bottom of
the zookeeper.properties file you have configured per the linked procedure
above: authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider jaasLoginRenew=3600000 requireClientAuthScheme=sasl
————————————— Scenario 1 : Kerberized
Kafka setup for NiFI Cluster: So for scenario one, we will assume you are
running on a NiFi cluster that has been setup per the above to use a kerberized
zookeeper for NiFi state management. Now that you have that setup, you have the
foundation in place to add support for connecting to kerberized Kafka brokers
and Kafka zookeepers. The PutKafka processor connects to the
Kafka broker and the GetKafka processor connects to the Kafka zookeepers.
In order to connect to via Kerberos, we will need to do the following: 1. Modify
the zookeeper-jaas.conf file we created when you setup the kerberized state
management stuff above: You will need to add a new section to the
zookeeper-jass.conf file for the Kafka client: If your NiFi node is running an embedded
zookeeper node, your zookeeper-jaas.comf file will contain: Server
{ com.sun.security.auth.module.Krb5LoginModule
required useKeyTab=true
keyTab="./conf/zookeeper-server.keytab" storeKey=true useTicketCache=false principal="zookeeper/myHost.example.com@EXAMPLE.COM"; }; Client
{ com.sun.security.auth.module.Krb5LoginModule
required useKeyTab=true keyTab="./conf/nifi.keytab" storeKey=true useTicketCache=false principal="nifi@EXAMPLE.COM”; }; KafkaClient
{
com.sun.security.auth.module.Krb5LoginModule required
useTicketCache=true
renewTicket=true
serviceName="kafka"
useKeyTab=true
keyTab="./conf/nifi.keytab"
principal="nifi@EXAMPLE.COM"; };
*** What is important to note here is that both
the “KafkaClient" and “Client" (used for both embedded zookeeper and
Kafka zookeeper) use the same principal and key tab *** *** The principal and key tab for the “Server”
(Used by the embedded NiFi zookeeper) do not need to be the same used by the
“KafkaClient" and “Client” *** If your NiFi cluster node is not running an
embedded zookeeper node, your zookeeper-jaas.comf file will contain: Client
{ com.sun.security.auth.module.Krb5LoginModule
required useKeyTab=true keyTab="./conf/nifi.keytab" storeKey=true useTicketCache=false principal="nifi@EXAMPLE.COM”; }; KafkaClient
{
com.sun.security.auth.module.Krb5LoginModule required
useTicketCache=true
renewTicket=true
serviceName="kafka"
useKeyTab=true
keyTab="./conf/nifi.keytab"
principal="nifi@EXAMPLE.COM"; };
*** What is important to note here is that
both the KafkaClient and the Client (used for both embedded zookeeper and Kafka
zookeeper) use the same principal and key tab *** 2. Add
additional property to the PutKafka and GetKafka processors: Now all the pieces are in place and we can start
our NiFi(s) and add/modify the PutKafka and GetKafka processors. You will need to add one new property by using
the on each putKafka and getKafka
processors “Properties tab: You will use this same security.protocol
(PLAINTEXTSASL) when intereacting with HDP Kafka versions 0.8.2 and 0.9.0. ———————————— Scenario 2 : Kerberized
Kafka setup for Standalone NiFi instance: For scenario two, a standalone NiFi does not use
zookeeper for state management. So rather then modifying and existing jaas.conf
file, we will need to create one from scratch. The PutKafka processor connects to the
Kafka broker and the GetKafka processor connects to the Kafka zookeepers.
In order to connect to via Kerberos, we will need to do the following: 1. You
will need to create a jaas.conf file somewhere on the server running your NiFi
instance. This file can be named whatever you want, but to avoid
confusion later should you turn your standlone NiFi deployment in to a NiFi
cluster deployment, I recommend continuing to name the file
zookeeper-jaas.conf. You will need to add the following lines to this
zookeeper-jass.conf file that will be used to talk to communicate with the
Kerberized Kafka brokers and Kerberized Kafka zookeeper(s) : Client
{ com.sun.security.auth.module.Krb5LoginModule
required useKeyTab=true keyTab="./conf/nifi.keytab" storeKey=true useTicketCache=false principal="nifi@EXAMPLE.COM”; }; KafkaClient
{
com.sun.security.auth.module.Krb5LoginModule required
useTicketCache=true
renewTicket=true
serviceName="kafka"
useKeyTab=true
keyTab="./conf/nifi.keytab"
principal="nifi@EXAMPLE.COM"; };
*** What is important to note here is that
both the KafkaClient and Client configs use the same principal and key tab *** 2. Added a config line
to the NiFi bootstrap.conf file: java.arg.15=-Djava.security.auth.login.config=/<path>/zookeeper-jaas.conf
*** the arg number (15 in this case) must
be unused by any other java.arg line in the bootstrap.conf file 3. Add
additional property to the PutKafka and GetKafka processors: Now all the pieces are in place and we can start
our NiFi(s) and add/modify the PutKafka and GetKafka processors. You will need to add one new property by using
the on each putKafka and getKafka
processors “Properties tab: You will use this same security.protocol
(PLAINTEXTSASL) when interacting with HDP Kafka versions 0.8.2 and 0.9.0. ———————————————————— That should be all you need to get setup and
going…. Let me fill you in on a few configuration
recommendations for your PutKafka and getKafka processors to achieve better
throughputs:
PutKafka: 1. Ignore for now what the documentation says for
the Batch Size property on the PutKafka processor. It is really a measure
of bytes, so jack that baby up from the default 200 to some much larger value. 2. Kafka can be configured to accept larger files
but is much more efficient working with smaller files. The default max
messages size accepted by Kafka is 1 MB, so try to keep the individual messages
smaller then that. Set the Max Record Size property to the max size a
message can be, as configured on your Kafka. Changing this value will not
change what your Kafka can accept, but will prevent NiFi from trying to send
something to big. 3. The Max Buffer Size property should be set to a
value large enough to accommodate the FlowFiles it is being fed. A single
NiFi FlowFile can contain many individual messages and the Message Delimiter
property can be used to split that large FlowFile content into is smaller
messages. The Delimiter could be new line or even a specific string of
characters to denote where one message ends and another begins. 4. Leave the run schedule at 0 sec and you may even
want to give the PutKafka an extra thread (Concurrent tasks)
GetKafka: 1. The Batch Size property on the GetKafka processor
is correct in the documentation and does refer to the number of messages to
batch together when pulled from a Kafka topic. The messages will end up
in a single outputted FlowFile and the configured Message Demarcator (default
new line) will be used to separate messages. 2. When pulling data from a Kafka topic that has
been configured to allow messages larger than 1 MB, you must add an additional
property to the GetKafka processor so it will pull those larger messages (the
processor itself defaults to 1 MB). Add fetch.message.max.bytes and
configure it to match the max allowed message size set on Kafka for the topic. 3. When using the GetKafka processor on a Standalone
instance of NiFi, the number of concurrent tasks should match the number of
partitions on the Kafka topic. This is not the case (dispite what the bulletin
tell you when it is started) when the GetKafka processor is running on a NIFi
cluster. Lets say you have 3 node NiFi cluster. Each Node in
the cluster will pull from a different partition at the same time. So if the
topic only has 3 partitions you will want to leave concurrent tasks at 1
(indicates 1 thread per NiFi node). If the topic has 6 partitions, set
concurrent tasks to 2. Let say the topic has 4 partitions, I would still
use one concurrent task. NiFi will still pull from all partitions, the
addition partition will be included in a Round Robin fashion. If you were
to set the same number of concurrent tasks as partitions in a NiFi cluster, you
will end up with only one Node pulling from every partition while your other
nodes sit idle. 4. Set your run schedule 500 ms to reduce excessive
CPU utilization.
... View more
- Find more articles tagged with:
- Data Ingestion & Streaming
- hdf
- How-ToTutorial
- Kafka
- Kerberos
- NiFi
04-01-2016
12:23 PM
The mergeContent processor provides the ability to create a tar file that contains two files in it. One is the original files content and the other is file containing all the attributes.
To accomplish this simply send your FlowFiles to this processor with it configured as follows:
One of the added benefits of this method is that you can pull these tar files back in to a NiFi and use the UnpackContent processor to automatically reinsert the the attributes back in to a FlowFile and restore the content to its original state at some later time.
Thanks, Matt
... View more
03-22-2016
03:38 PM
2 Kudos
Bulletins are intended to be short lived within the UI. The same error messages are also being reported to the nifi-app.log where the length of time they are preserved is based your configuration of the NiFi instance's logback.xml file. There should be no difference between the detail in the bulletin and the detail in the nifi-app.log.
... View more
03-22-2016
02:27 PM
There have also been many improvements to the underlying code for the Kafka processors in newer releases of NiFi. I recommend upgrading.
... View more
- « Previous
- Next »