About knarayanan

DianaTorres · ‎04-19-2024

@SamarApple Hello! As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post. Thanks.

Avrotojson · ‎12-07-2022

While converting from json to avro format,how to get logicaltype in avro format. And to get logicaltype in avro format,what we need to add in json data .

Garyy · ‎01-07-2021

I tried "java.arg.8=-Duser.timezone=America/New_York". It does not work for me. I posted one question earlier: https://stackoverflow.com/questions/65620632/why-do-executesqlrecord-and-csvrecordsetwriter-updated-the-time-zone-of-datetime

snm1523 · ‎10-07-2020

Hello Shishir, Would you mind to please how do we migrate a standalone Nifi setup to cluster mode? Thanks snm1523

knarayanan · ‎07-09-2020

@sajidiqubal CAn you share more info. The solution is simple, the spart streaming jobs needs to find the kafka-jaas and the corresponding keytab. Make sure both paths are accessible on all machines. So kafka-jaas and the keytab need to be in the local folder and not hdfs. If you need in hdfs, then it needs to be sent it as a part of the spark --files and --keytab arguments ( iirc). In newer versions of kafka, you can add the jaas info as a kafka parameter using sasl.jaas.config. see ex. below. In that case you just need the keytab to be available on all machines. sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required \ useKeyTab=true \ storeKey=true \ keyTab="/etc/security/keytabs/kafka_client.keytab" \ principal="kafkaclient1@EXAMPLE.COM"; you will also need a parameter sasl.kerberos.service.name=kafka From your error it looks like the code is not able to find one of the files, the jaas.conf or the keytab. Please check and make sure the file is in the right path and on all yarn nodes.

kumar993498 · ‎03-03-2020

Can you please post the template, I am trying to solve the same problem. It would be a great help for me

knarayanan · ‎08-16-2018

Was trying to dig up some TSP benchmark info for nifi listenhttp, which allows for providing a rest proxy, with no luck, so i tried to create one myself. This is a very rough effort, i can improve it by capturing how many client instances i have running , so we can see the TPS drop and rise as the clients drop and rise. At some point i ran out of servers and resources to run more clients, you will the chart drops off at the end , before going back up. This is where i noticed some of my clients had crashed and i restarted them. Anyway, getting to the matter. Purpose The benchmark only meassure how much load can a ListenHTTP processor handle , when subjected to real world traffic. Setup The nifi cluster is setup on an m4.4xlarge ( 16 cores CPU, 32 GB RAM), The node is also hosting the kafka broker and zookeeper. HDF version is 3.1.1 The NiFi is a simple Listenhttp processor forwarding to updateattribute. updateattribute burns the flowfile. The idea was to only measure Listenhttp performance for receiving a message, create flowfile, respond to client and forward the message to next processor. The benchmark tries to measure what kind of peak TPS could be achieved. The NiFi instance is running a S2S provenance task, which forwards provenance event to another nifi instance, which further forwards it to a kafka topic. The data is then ingested into Druid using kafka ingestion. timestampmillis column of the provenancce event will be used by druid for indexing. For the client piece i have a simple python script that constantly calls the rest service exposed by listenhttp, passing the below json. The timestamp in the json is just to ensure the messages are different. {“key”:”clien1”,”timestamp”:<current_unix_time>}. The python is a simple infinite loop in the below format. import requests import time import random from multiprocessing import Process import os import json import threading from time import sleep def call_rest(): values=["client1"] value = random.choice(values) start = time.time() timestamp = round(time.time()*1000) r = requests.post('http://nifi1.field.hortonworks.com:19192/test',data = json.dumps({"key":value,"timestamp":timestamp})) while True: threads = [] for i in range(5): t = threading.Thread(target=call_rest) threads.append(t) t.start() I ran 5 instances of the script across 8 servers to help me generate the kind of volume i needed for this test. Dashboard Once the data is in druid, i can utilize superset to chart and aggregate the provenance events at an interval of one second. Since the provenance events can take a few minutes to arrive, i used a one minute window from 5 minutes ago, meaning from t-5 to t-4 timestamps. This what i saw on the chart, I also filterd by query to only look for componentType=Listenhttp and eventType=RECEIVE. From the above chart we can see that the rate fluctuates from a max of 3000 TPS max to around 600 TPS minimum. To get a better aggregation or a even aggregation, i aggregated this over 5 minute interval over an hour to see what we are doing on average...The chart was pretty promising. So on an average we are looking at 300k messages per 5 minutes, which is around 1000 TPS. Conclusion The 1000 TPS we se see from NiFi from this above load test, is not probably what the max load it can handle, i can try and run my tasks on more severs and see if we see higher numbers. But, at 1000 TPS , NiFi should be able to handle most web based traffic. Additionaly this is on a clusert with one node of NiFi, we can linearly scale by adding more nodes to the cluster .

knarayanan · ‎08-07-2018

Nifi build on HDF 3.1.2 and HDF 3.1.0 fail with a depency issue when trying to push data to ADLS. This is because the new version of ADLS has some dependency on hadoop 2.8 feature, which is not available in 2.7.3 which is referenced by nifi. to fix this you can build nifi again. You could eith build it against hadoop 2.8 or againt hdp 2.6.x which should have the classes that ADLS depends on. to do that git clone nifi repository cd <nif-repo-home>/nifi-nar-bundles/nifi-hadoop-libraries-bundle/nifi-hadoop-libraries-nar/ vi pom.xml add a hadoop.version property to the pom.xml as shown below. if already set, no change is needed. change nifi version to match the nifi version you are running for the parent <parent> <groupId>org.apache.nifi</groupId> <artifactId>nifi-hadoop-libraries-bundle</artifactId> <version>1.5.0.3.1.2.0-7</version> </parent> <artifactId>nifi-hadoop-libraries-nar</artifactId> <packaging>nar</packaging> <properties> <maven.javadoc.skip>true</maven.javadoc.skip> <source.skip>true</source.skip> <curator.version>2.11.0</curator.version> <hadoop.version>2.7.3</hadoop.version> </properties> cd .. vi pom.xml change the nifi-nar-bunlde version to match your nifi version as shown below <parent> <groupId>org.apache.nifi</groupId> <artifactId>nifi-nar-bundles</artifactId> <version>1.5.0.3.1.2.0-7</version> </parent> run the maven build using the command below. change hadoop.version to match your version of hadoop. the nar will avaliable under nifi-hadoop-libraries-nar/target folder. Take that nar and replace the existing nar under nifi/lib. mvn clean package -Dhadoop.version=2.7.3.2.6.5.0-292 Ensure you have the right jars for adls downloaded into a folder accesible to nifi. Add the folder path to Additional classpath option in PUTHDFS. for hdp 2.6.x you can find the needed jars under /usr/hdp/2.6.x..../hadoop/ you will need the following jars. azure-data-lake-store-sdk-2.2.5.jar hadoop-azure-2.7.3.2.6.5.0-292.jar hadoop-azure-datalake-2.7.3.2.6.5.0-292.jar start nifi and push files , hope it works.

mike-nacey · ‎07-30-2018

So far I have been able to get this working. Traffic flows fine through the final NLB, but we want to do some better load testing. I have put together a post that explains: https://everymansravings.wordpress.com/2018/07/27/apache-nifi-behind-an-aws-load-balancer-w-minifi/

knarayanan · ‎10-17-2018

sorry just saw this message. could you provide more details

Online	Offline
Last Visited	‎08-20-2020 04:40 PM

Member Since	‎05-02-2016 08:13 PM
Last Visited	‎08-20-2020 04:40 PM
Posts	154
Kudos received	54

Cloudera Community

Re: MiniFi to NiFi connection through load balance...

Re: Nifi PutS3Object error with AMI Role (AwsCrede...

Re: Need help on a logic using NiFi.

Re: NIFI Installation: keep getting "Apache NiFi i...

Re: XSL to CSV using NiFi

Re: Error in PutS3Object

Re: Best Practice - JSON to Avro, data type preser...

Re: Change Timezone in Nifi

Re: Migrating HDF 2.0 node from standalone to clus...

Re: Accessing kerberised kafka from spark using ze...

Re: How to do row count using Nifi in source table...

NiFi Listenhttp Benchmark

Pushing to ADLS using Nifi

Re: MiniFi to NiFi connection through load balance...

Re: Nifi and Druid for real time dashboarding for ...