About MattWho

yogesh_sharma05 · ‎08-16-2016

@mclark Thanks for the response and appreciated. Do I need to configure something at back-end as well i.e. in nifi.properties or any other file in cluster or node because I am facing attached error.

Wynner · ‎08-03-2016

+1 to mclarke's solution.

anubhav_r_raika · ‎12-02-2016

yeah thanks.. append works.. \\n (double backslash) doesnt.. I was doing this while writing count to file.. it now works.. The count is: ${executesql.row.count:append('\n')}

patrasuman · ‎06-25-2018

If I create a new template, it is creating a flow.xml.gz and working fine. However If I replace the flow.xml.gz with old flow.xml.gz (one backup taken earlier in same cluster node HDF2.X) nifi UI is not coming after login and giving error as "com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out". Have tried with all parameter tuning as described by "Matt Clarke" in some other post but no results. Again If I move old file and replace with new flow.xml.gz then nifi is working fine. Please let me know if anyone faces such issues and probable reason and working around. Thanks, Suman

sunile_manjee · ‎07-19-2016

@Michael Sobelman That DNS is not detectable by the node you are trying to access from. You can be fancy on aws and configure through routing tables by setting up a proper vpn between the EMR and NiFi nodes. Another option I used is route53 which will give you DNS publicly available. Lastly you can put a ELB infront of your EMR HBase master node. You may have to script it up (via boot scripts) to configure your ELB to point to new internal IP.

mliem · ‎07-08-2016

@mclark Great suggestion, thanks! Will definitely take a look at incorporating invokeHTTP.

MattWho · ‎07-26-2017

@AnjiReddy Anumolu Just to add a little more detail to the above response from @zblanco. When NiFi ingest data, that data is turned in to NiFi FlowFiles. A NiFi FlowFile consists of Attributes (Metadata) about the actual data and the physical data. The FlowFile metadata is stored in the FlowFile repository as well as JVM heap memory for faster performance. The FlowFile Attributes includes things like filename, ingest time, lineage age, filesize, what connection the FlowFile currently resides in dataflow, any user defined metadata, or processor added metadata, etc....). The physical bytes that make up the actual data content is written to claims within the NiFi content repository. A claim can contain the bytes for 1 to many ingest data files. For more info on the content repository and how claims work, see the following link: https://community.hortonworks.com/articles/82308/understanding-how-nifis-content-repository-archivi.html Thanks, Matt

pam1 · ‎05-31-2016

@mclark Hi Matt, I really appreciate your replies. This is great idea for you to create: "How to setup my first non-secured NiFi cluster.".

MattWho · ‎11-10-2016

@vlundberg This has nothing to do with being installed via Ambari. If the core-site.xml file that is being used by the HDFS processor in NiFi reference a Class which NiFi does not include, you will get a NoClassDef found error. Adding new Class to NiFi's HDFS NAR bundle may be a possibility, but as I am not a developer i can't speak to that. You can always file an Apache Jira against NiFi for this change. https://issues.apache.org/jira/secure/Dashboard.jspa Thanks, Matt

MattWho · ‎04-18-2016

Setting up Hortonworks Dataflow (HDF) to work with kerberized Kafka in Hortonworks Data Platform (HDP) HDF 1.2 does not contain the same Kafka client libraries as the Apache NiFi version. HDF Kafka libraries are specifically designed to work with the Kafka versions supplied with HDP. The following Kafka support matrix breaks down what is supported in each Kafka version: *** (Apache) refers to the Kafka version downloadable from the Apache website. For newer versions of HDF (1.1.2+), NiFi uses zookeeper to maintain cluster wide state. So the following only applies if this is a HDF NiFi cluster: 1. If a NiFi cluster has been setup to use a kerberized external or internal zookeeper for state, every kerberized connection to any other zookeeper would require using the same keytab and principal. For example a kerberized embedded zookeeper in NiFi would need to be configured to use the same client keytab and principal you want to use to authenticate with a say a Kafka zookeeper. 2. If a NiFi cluster has been setup to use a non-kerberized zookeeper for state, it cannot then talk to any other zookeeper that does use kerberos. 3. If a NiFi cluster has been setup to use a kerberized zookeeper for state, it cannot then communicate with any other non-kerberized zookeeper. With that being said, the PutKafka and GetKafka processors do not have properties like the HDFS processors for keytab and principal. The keytab and principal would be defined in the same jaas file used if you setup HDF cluster state management. So before even trying to connect to kerberized Kafka, we need to get NiFi state management configured to use either an embedded or external kerberized zookeeper for state. Even if you are not clustered right now, you need to take the above in to consideration if you plan on upgrading to being a cluster later: —————————————— NiFi Cluster Kerberized State Management: https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#state_management Lets assume you followed the above linked procedure to setup your NiFi cluster to create an embedded zookeeper. At the end of the above procedure you will have made the following config changes on each of your NiFi Nodes: 1. Created a zookeeper-jaas.conf file On nodes with embedded zookeeper, it will contain something like this: Server { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="./conf/zookeeper-server.keytab" storeKey=true useTicketCache=false principal="zookeeper/myHost.example.com@EXAMPLE.COM"; }; Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="./conf/nifi.keytab" storeKey=true useTicketCache=false principal="nifi@EXAMPLE.COM"; }; On Nodes without embedded zookeeper, it will look something like this: Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="./conf/nifi.keytab" storeKey=true useTicketCache=false principal="nifi@EXAMPLE.COM"; }; 2. Added a config line to the NiFi bootstrap.conf file: java.arg.15=-Djava.security.auth.login.config=/<path>/zookeeper-jaas.conf *** the arg number (15 in this case) must be unused by any other java.arg line in the bootstrap.conf file 3. Added 3 additional properties to the bottom of the zookeeper.properties file you have configured per the linked procedure above: authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider jaasLoginRenew=3600000 requireClientAuthScheme=sasl ————————————— Scenario 1 : Kerberized Kafka setup for NiFI Cluster: So for scenario one, we will assume you are running on a NiFi cluster that has been setup per the above to use a kerberized zookeeper for NiFi state management. Now that you have that setup, you have the foundation in place to add support for connecting to kerberized Kafka brokers and Kafka zookeepers. The PutKafka processor connects to the Kafka broker and the GetKafka processor connects to the Kafka zookeepers. In order to connect to via Kerberos, we will need to do the following: 1. Modify the zookeeper-jaas.conf file we created when you setup the kerberized state management stuff above: You will need to add a new section to the zookeeper-jass.conf file for the Kafka client: If your NiFi node is running an embedded zookeeper node, your zookeeper-jaas.comf file will contain: Server { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="./conf/zookeeper-server.keytab" storeKey=true useTicketCache=false principal="zookeeper/myHost.example.com@EXAMPLE.COM"; }; Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="./conf/nifi.keytab" storeKey=true useTicketCache=false principal="nifi@EXAMPLE.COM”; }; KafkaClient { com.sun.security.auth.module.Krb5LoginModule required useTicketCache=true renewTicket=true serviceName="kafka" useKeyTab=true keyTab="./conf/nifi.keytab" principal="nifi@EXAMPLE.COM"; }; *** What is important to note here is that both the “KafkaClient" and “Client" (used for both embedded zookeeper and Kafka zookeeper) use the same principal and key tab *** *** The principal and key tab for the “Server” (Used by the embedded NiFi zookeeper) do not need to be the same used by the “KafkaClient" and “Client” *** If your NiFi cluster node is not running an embedded zookeeper node, your zookeeper-jaas.comf file will contain: Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="./conf/nifi.keytab" storeKey=true useTicketCache=false principal="nifi@EXAMPLE.COM”; }; KafkaClient { com.sun.security.auth.module.Krb5LoginModule required useTicketCache=true renewTicket=true serviceName="kafka" useKeyTab=true keyTab="./conf/nifi.keytab" principal="nifi@EXAMPLE.COM"; }; *** What is important to note here is that both the KafkaClient and the Client (used for both embedded zookeeper and Kafka zookeeper) use the same principal and key tab *** 2. Add additional property to the PutKafka and GetKafka processors: Now all the pieces are in place and we can start our NiFi(s) and add/modify the PutKafka and GetKafka processors. You will need to add one new property by using the on each putKafka and getKafka processors “Properties tab: You will use this same security.protocol (PLAINTEXTSASL) when intereacting with HDP Kafka versions 0.8.2 and 0.9.0. ———————————— Scenario 2 : Kerberized Kafka setup for Standalone NiFi instance: For scenario two, a standalone NiFi does not use zookeeper for state management. So rather then modifying and existing jaas.conf file, we will need to create one from scratch. The PutKafka processor connects to the Kafka broker and the GetKafka processor connects to the Kafka zookeepers. In order to connect to via Kerberos, we will need to do the following: 1. You will need to create a jaas.conf file somewhere on the server running your NiFi instance. This file can be named whatever you want, but to avoid confusion later should you turn your standlone NiFi deployment in to a NiFi cluster deployment, I recommend continuing to name the file zookeeper-jaas.conf. You will need to add the following lines to this zookeeper-jass.conf file that will be used to talk to communicate with the Kerberized Kafka brokers and Kerberized Kafka zookeeper(s) : Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="./conf/nifi.keytab" storeKey=true useTicketCache=false principal="nifi@EXAMPLE.COM”; }; KafkaClient { com.sun.security.auth.module.Krb5LoginModule required useTicketCache=true renewTicket=true serviceName="kafka" useKeyTab=true keyTab="./conf/nifi.keytab" principal="nifi@EXAMPLE.COM"; }; *** What is important to note here is that both the KafkaClient and Client configs use the same principal and key tab *** 2. Added a config line to the NiFi bootstrap.conf file: java.arg.15=-Djava.security.auth.login.config=/<path>/zookeeper-jaas.conf *** the arg number (15 in this case) must be unused by any other java.arg line in the bootstrap.conf file 3. Add additional property to the PutKafka and GetKafka processors: Now all the pieces are in place and we can start our NiFi(s) and add/modify the PutKafka and GetKafka processors. You will need to add one new property by using the on each putKafka and getKafka processors “Properties tab: You will use this same security.protocol (PLAINTEXTSASL) when interacting with HDP Kafka versions 0.8.2 and 0.9.0. ———————————————————— That should be all you need to get setup and going…. Let me fill you in on a few configuration recommendations for your PutKafka and getKafka processors to achieve better throughputs: PutKafka: 1. Ignore for now what the documentation says for the Batch Size property on the PutKafka processor. It is really a measure of bytes, so jack that baby up from the default 200 to some much larger value. 2. Kafka can be configured to accept larger files but is much more efficient working with smaller files. The default max messages size accepted by Kafka is 1 MB, so try to keep the individual messages smaller then that. Set the Max Record Size property to the max size a message can be, as configured on your Kafka. Changing this value will not change what your Kafka can accept, but will prevent NiFi from trying to send something to big. 3. The Max Buffer Size property should be set to a value large enough to accommodate the FlowFiles it is being fed. A single NiFi FlowFile can contain many individual messages and the Message Delimiter property can be used to split that large FlowFile content into is smaller messages. The Delimiter could be new line or even a specific string of characters to denote where one message ends and another begins. 4. Leave the run schedule at 0 sec and you may even want to give the PutKafka an extra thread (Concurrent tasks) GetKafka: 1. The Batch Size property on the GetKafka processor is correct in the documentation and does refer to the number of messages to batch together when pulled from a Kafka topic. The messages will end up in a single outputted FlowFile and the configured Message Demarcator (default new line) will be used to separate messages. 2. When pulling data from a Kafka topic that has been configured to allow messages larger than 1 MB, you must add an additional property to the GetKafka processor so it will pull those larger messages (the processor itself defaults to 1 MB). Add fetch.message.max.bytes and configure it to match the max allowed message size set on Kafka for the topic. 3. When using the GetKafka processor on a Standalone instance of NiFi, the number of concurrent tasks should match the number of partitions on the Kafka topic. This is not the case (dispite what the bulletin tell you when it is started) when the GetKafka processor is running on a NIFi cluster. Lets say you have 3 node NiFi cluster. Each Node in the cluster will pull from a different partition at the same time. So if the topic only has 3 partitions you will want to leave concurrent tasks at 1 (indicates 1 thread per NiFi node). If the topic has 6 partitions, set concurrent tasks to 2. Let say the topic has 4 partitions, I would still use one concurrent task. NiFi will still pull from all partitions, the addition partition will be included in a Round Robin fashion. If you were to set the same number of concurrent tasks as partitions in a NiFi cluster, you will end up with only one Node pulling from every partition while your other nodes sit idle. 4. Set your run schedule 500 ms to reduce excessive CPU utilization.

Member Since	‎07-30-2019 10:41 AM
Last Visited
Posts	3,131
Kudos received	1560

Cloudera Community

Re: Flowfile stuck in Wait in EnforceOrder process...

Re: Untrusted proxy error Authentication Failed o....

Re: REST API Configuration for NiFi 2.0

Re: Fileflow penalized for certain time before all...

Re: Nifi : Implement Sleep Mechanism in nifi witho...

Re: How to get only unique data from flow files

Re: Issue in nifi clustering

Re: Apache Nifi - Adding attributes as newlines Fl...

Re: How can I save my NiFi flows into XML?

Re: Connect a Nifi service which lives on an EC2 i...

Re: Merge Content processors with dependency from ...

Re: How to Know in which directory data stores aft...

Re: Controlling level of access in NiFi question

Re: PUTHDFS processor not working - NoClassDefFoun...

How to configure HDF 1.2 to send to and get data f...