Member since
07-30-2019
3131
Posts
1564
Kudos Received
909
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
106 | 01-09-2025 11:14 AM | |
674 | 01-03-2025 05:59 AM | |
396 | 12-13-2024 10:58 AM | |
429 | 12-05-2024 06:38 AM | |
360 | 11-22-2024 05:50 AM |
08-16-2016
08:25 AM
@mclark Thanks for the response and appreciated. Do I need to configure something at back-end as well i.e. in nifi.properties or any other file in cluster or node because I am facing attached error.
... View more
12-02-2016
12:06 PM
yeah thanks.. append works.. \\n (double backslash) doesnt.. I was doing this while writing count to file.. it now works.. The count is: ${executesql.row.count:append('\n')}
... View more
06-25-2018
02:29 PM
If I create a new template, it is creating a flow.xml.gz and working fine. However If I replace the flow.xml.gz with old flow.xml.gz (one backup taken earlier in same cluster node HDF2.X) nifi UI is not coming after login and giving error as "com.sun.jersey.api.client.ClientHandlerException:
java.net.SocketTimeoutException: Read timed out". Have tried with all parameter tuning as described by "Matt Clarke" in some other post but no results. Again If I move old file and replace with new flow.xml.gz then nifi is working fine. Please let me know if anyone faces such issues and probable reason and working around. Thanks, Suman
... View more
07-19-2016
02:23 AM
@Michael Sobelman That DNS is not detectable by the node you are trying to access from. You can be fancy on aws and configure through routing tables by setting up a proper vpn between the EMR and NiFi nodes. Another option I used is route53 which will give you DNS publicly available. Lastly you can put a ELB infront of your EMR HBase master node. You may have to script it up (via boot scripts) to configure your ELB to point to new internal IP.
... View more
07-08-2016
01:14 PM
@mclark Great suggestion, thanks! Will definitely take a look at incorporating invokeHTTP.
... View more
07-26-2017
12:39 PM
@AnjiReddy Anumolu Just to add a little more detail to the above response from @zblanco. When NiFi ingest data, that data is turned in to NiFi FlowFiles. A NiFi FlowFile consists of Attributes (Metadata) about the actual data and the physical data. The FlowFile metadata is stored in the FlowFile repository as well as JVM heap memory for faster performance. The FlowFile Attributes includes things like filename, ingest time, lineage age, filesize, what connection the FlowFile currently resides in dataflow, any user defined metadata, or processor added metadata, etc....). The physical bytes that make up the actual data content is written to claims within the NiFi content repository. A claim can contain the bytes for 1 to many ingest data files. For more info on the content repository and how claims work, see the following link: https://community.hortonworks.com/articles/82308/understanding-how-nifis-content-repository-archivi.html Thanks, Matt
... View more
05-31-2016
08:55 PM
@mclark Hi Matt, I really appreciate your replies. This is great idea for you to create: "How to setup my first non-secured NiFi cluster.".
... View more
11-10-2016
01:21 PM
@vlundberg This has nothing to do with being installed via Ambari. If the core-site.xml file that is being used by the HDFS processor in NiFi reference a Class which NiFi does not include, you will get a NoClassDef found error. Adding new Class to NiFi's HDFS NAR bundle may be a possibility, but as I am not a developer i can't speak to that. You can always file an Apache Jira against NiFi for this change. https://issues.apache.org/jira/secure/Dashboard.jspa Thanks, Matt
... View more
04-18-2016
09:28 PM
4 Kudos
Setting up Hortonworks Dataflow (HDF) to work with kerberized Kafka in Hortonworks Data Platform (HDP) HDF 1.2 does not contain the same Kafka client libraries as the Apache NiFi version. HDF Kafka libraries are specifically designed to work with the Kafka versions supplied with HDP. The following Kafka support matrix breaks down what is supported in each Kafka version: *** (Apache) refers to the Kafka version downloadable from the Apache website. For newer versions of HDF (1.1.2+), NiFi uses
zookeeper to maintain cluster wide state. So the following only applies if this
is a HDF NiFi cluster: 1. If a NiFi cluster has been setup to use a
kerberized external or internal zookeeper for state, every kerberized
connection to any other zookeeper would require using the same keytab and
principal. For example a kerberized embedded zookeeper in NiFi would need
to be configured to use the same client keytab and principal you want to use to
authenticate with a say a Kafka zookeeper. 2. If a NiFi cluster has been setup to use a
non-kerberized zookeeper for state, it cannot then talk to any other zookeeper
that does use kerberos. 3. If a NiFi cluster has been setup to use a kerberized
zookeeper for state, it cannot then communicate with any other non-kerberized
zookeeper. With that being said,
the PutKafka and GetKafka processors do not have properties like the HDFS
processors for keytab and principal. The keytab and principal would be
defined in the same jaas file used if you setup HDF cluster state management.
So before even trying to connect to kerberized Kafka, we need to get NiFi
state management configured to use either an embedded or external kerberized
zookeeper for state. Even if you are not clustered right now, you need to take
the above in to consideration if you plan on upgrading to being a cluster
later: —————————————— NiFi Cluster Kerberized State Management: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#state_management Lets assume
you followed the above linked procedure to setup your NiFi cluster to create an
embedded zookeeper. At the end of the above procedure you will have made
the following config changes on each of your NiFi Nodes: 1. Created a zookeeper-jaas.conf file On nodes with embedded zookeeper, it will contain
something like this: Server
{ com.sun.security.auth.module.Krb5LoginModule
required useKeyTab=true
keyTab="./conf/zookeeper-server.keytab" storeKey=true useTicketCache=false principal="zookeeper/myHost.example.com@EXAMPLE.COM"; }; Client
{ com.sun.security.auth.module.Krb5LoginModule
required useKeyTab=true keyTab="./conf/nifi.keytab" storeKey=true useTicketCache=false principal="nifi@EXAMPLE.COM"; }; On Nodes without embedded zookeeper, it will look
something like this: Client
{ com.sun.security.auth.module.Krb5LoginModule
required useKeyTab=true keyTab="./conf/nifi.keytab" storeKey=true useTicketCache=false principal="nifi@EXAMPLE.COM"; };
2. Added a config line to the NiFi
bootstrap.conf file: java.arg.15=-Djava.security.auth.login.config=/<path>/zookeeper-jaas.conf
*** the arg number (15 in this case) must
be unused by any other java.arg line in the bootstrap.conf file 3. Added 3 additional properties to the bottom of
the zookeeper.properties file you have configured per the linked procedure
above: authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider jaasLoginRenew=3600000 requireClientAuthScheme=sasl
————————————— Scenario 1 : Kerberized
Kafka setup for NiFI Cluster: So for scenario one, we will assume you are
running on a NiFi cluster that has been setup per the above to use a kerberized
zookeeper for NiFi state management. Now that you have that setup, you have the
foundation in place to add support for connecting to kerberized Kafka brokers
and Kafka zookeepers. The PutKafka processor connects to the
Kafka broker and the GetKafka processor connects to the Kafka zookeepers.
In order to connect to via Kerberos, we will need to do the following: 1. Modify
the zookeeper-jaas.conf file we created when you setup the kerberized state
management stuff above: You will need to add a new section to the
zookeeper-jass.conf file for the Kafka client: If your NiFi node is running an embedded
zookeeper node, your zookeeper-jaas.comf file will contain: Server
{ com.sun.security.auth.module.Krb5LoginModule
required useKeyTab=true
keyTab="./conf/zookeeper-server.keytab" storeKey=true useTicketCache=false principal="zookeeper/myHost.example.com@EXAMPLE.COM"; }; Client
{ com.sun.security.auth.module.Krb5LoginModule
required useKeyTab=true keyTab="./conf/nifi.keytab" storeKey=true useTicketCache=false principal="nifi@EXAMPLE.COM”; }; KafkaClient
{
com.sun.security.auth.module.Krb5LoginModule required
useTicketCache=true
renewTicket=true
serviceName="kafka"
useKeyTab=true
keyTab="./conf/nifi.keytab"
principal="nifi@EXAMPLE.COM"; };
*** What is important to note here is that both
the “KafkaClient" and “Client" (used for both embedded zookeeper and
Kafka zookeeper) use the same principal and key tab *** *** The principal and key tab for the “Server”
(Used by the embedded NiFi zookeeper) do not need to be the same used by the
“KafkaClient" and “Client” *** If your NiFi cluster node is not running an
embedded zookeeper node, your zookeeper-jaas.comf file will contain: Client
{ com.sun.security.auth.module.Krb5LoginModule
required useKeyTab=true keyTab="./conf/nifi.keytab" storeKey=true useTicketCache=false principal="nifi@EXAMPLE.COM”; }; KafkaClient
{
com.sun.security.auth.module.Krb5LoginModule required
useTicketCache=true
renewTicket=true
serviceName="kafka"
useKeyTab=true
keyTab="./conf/nifi.keytab"
principal="nifi@EXAMPLE.COM"; };
*** What is important to note here is that
both the KafkaClient and the Client (used for both embedded zookeeper and Kafka
zookeeper) use the same principal and key tab *** 2. Add
additional property to the PutKafka and GetKafka processors: Now all the pieces are in place and we can start
our NiFi(s) and add/modify the PutKafka and GetKafka processors. You will need to add one new property by using
the on each putKafka and getKafka
processors “Properties tab: You will use this same security.protocol
(PLAINTEXTSASL) when intereacting with HDP Kafka versions 0.8.2 and 0.9.0. ———————————— Scenario 2 : Kerberized
Kafka setup for Standalone NiFi instance: For scenario two, a standalone NiFi does not use
zookeeper for state management. So rather then modifying and existing jaas.conf
file, we will need to create one from scratch. The PutKafka processor connects to the
Kafka broker and the GetKafka processor connects to the Kafka zookeepers.
In order to connect to via Kerberos, we will need to do the following: 1. You
will need to create a jaas.conf file somewhere on the server running your NiFi
instance. This file can be named whatever you want, but to avoid
confusion later should you turn your standlone NiFi deployment in to a NiFi
cluster deployment, I recommend continuing to name the file
zookeeper-jaas.conf. You will need to add the following lines to this
zookeeper-jass.conf file that will be used to talk to communicate with the
Kerberized Kafka brokers and Kerberized Kafka zookeeper(s) : Client
{ com.sun.security.auth.module.Krb5LoginModule
required useKeyTab=true keyTab="./conf/nifi.keytab" storeKey=true useTicketCache=false principal="nifi@EXAMPLE.COM”; }; KafkaClient
{
com.sun.security.auth.module.Krb5LoginModule required
useTicketCache=true
renewTicket=true
serviceName="kafka"
useKeyTab=true
keyTab="./conf/nifi.keytab"
principal="nifi@EXAMPLE.COM"; };
*** What is important to note here is that
both the KafkaClient and Client configs use the same principal and key tab *** 2. Added a config line
to the NiFi bootstrap.conf file: java.arg.15=-Djava.security.auth.login.config=/<path>/zookeeper-jaas.conf
*** the arg number (15 in this case) must
be unused by any other java.arg line in the bootstrap.conf file 3. Add
additional property to the PutKafka and GetKafka processors: Now all the pieces are in place and we can start
our NiFi(s) and add/modify the PutKafka and GetKafka processors. You will need to add one new property by using
the on each putKafka and getKafka
processors “Properties tab: You will use this same security.protocol
(PLAINTEXTSASL) when interacting with HDP Kafka versions 0.8.2 and 0.9.0. ———————————————————— That should be all you need to get setup and
going…. Let me fill you in on a few configuration
recommendations for your PutKafka and getKafka processors to achieve better
throughputs:
PutKafka: 1. Ignore for now what the documentation says for
the Batch Size property on the PutKafka processor. It is really a measure
of bytes, so jack that baby up from the default 200 to some much larger value. 2. Kafka can be configured to accept larger files
but is much more efficient working with smaller files. The default max
messages size accepted by Kafka is 1 MB, so try to keep the individual messages
smaller then that. Set the Max Record Size property to the max size a
message can be, as configured on your Kafka. Changing this value will not
change what your Kafka can accept, but will prevent NiFi from trying to send
something to big. 3. The Max Buffer Size property should be set to a
value large enough to accommodate the FlowFiles it is being fed. A single
NiFi FlowFile can contain many individual messages and the Message Delimiter
property can be used to split that large FlowFile content into is smaller
messages. The Delimiter could be new line or even a specific string of
characters to denote where one message ends and another begins. 4. Leave the run schedule at 0 sec and you may even
want to give the PutKafka an extra thread (Concurrent tasks)
GetKafka: 1. The Batch Size property on the GetKafka processor
is correct in the documentation and does refer to the number of messages to
batch together when pulled from a Kafka topic. The messages will end up
in a single outputted FlowFile and the configured Message Demarcator (default
new line) will be used to separate messages. 2. When pulling data from a Kafka topic that has
been configured to allow messages larger than 1 MB, you must add an additional
property to the GetKafka processor so it will pull those larger messages (the
processor itself defaults to 1 MB). Add fetch.message.max.bytes and
configure it to match the max allowed message size set on Kafka for the topic. 3. When using the GetKafka processor on a Standalone
instance of NiFi, the number of concurrent tasks should match the number of
partitions on the Kafka topic. This is not the case (dispite what the bulletin
tell you when it is started) when the GetKafka processor is running on a NIFi
cluster. Lets say you have 3 node NiFi cluster. Each Node in
the cluster will pull from a different partition at the same time. So if the
topic only has 3 partitions you will want to leave concurrent tasks at 1
(indicates 1 thread per NiFi node). If the topic has 6 partitions, set
concurrent tasks to 2. Let say the topic has 4 partitions, I would still
use one concurrent task. NiFi will still pull from all partitions, the
addition partition will be included in a Round Robin fashion. If you were
to set the same number of concurrent tasks as partitions in a NiFi cluster, you
will end up with only one Node pulling from every partition while your other
nodes sit idle. 4. Set your run schedule 500 ms to reduce excessive
CPU utilization.
... View more
- « Previous
- Next »