Member since 
    
	
		
		
		05-30-2018
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                1322
            
            
                Posts
            
        
                715
            
            
                Kudos Received
            
        
                148
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 4005 | 08-20-2018 08:26 PM | |
| 1878 | 08-15-2018 01:59 PM | |
| 2336 | 08-13-2018 02:20 PM | |
| 4059 | 07-23-2018 04:37 PM | |
| 4950 | 07-19-2018 12:52 PM | 
			
    
	
		
		
		01-20-2021
	
		
		12:55 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							         Credits to @mbawa (Mandeep Singh Bawa) who co-built all the assets in this article. Thank you!     We (Mandeep and I) engaged on a customer use case where Cloudera Data Engineering (Spark) jobs were triggered once a file lands in S3 (details on how to trigger CDE from Lambda here). Triggering CDE jobs is quite simple; however, we needed much more. Here are a few of the requirements:   Decoupling Ingestion Layer / Processing Layer  Decoupling apps (sender) from Spark  Apps can send and forget payloads without the burden of configuring Spark (#number of executors, memory/cpu, etc), the concern of Spark availability (Upgrades, resources availability, etc), or application impacts from CDE API changes    Real-time changes to where CDE jobs are sent (Multi CDE)  Monitor job status and alerts  Monitoring job run times and alerts which may be out-of-spec runtimes  Failover to Secondary CDE  Throttling  Authentication       It may look as though we are trying to make NiFi into an orchestration engine for CDE. That's not the case.  Here we are trying to fill some core objectives and leveraging capabilities within the platform to accomplish the above-stated task. CDE comes with Apache Airflow, a much richer orchestration engine. Here we are integrating AWS triggers, multiple CDE clusters, monitoring, alerting, and single API for multi clusters.   Artifacts   NiFi CDE Jobs Pipeline Workflow  Streams Messaging Cluster (Kafka)  CDF clusters (NiFi)    Heavy usage of NiFi parameters     High-Level WorkFlow  At a high level, the NiFi workflow does the following:   Exposes a single rest endpoint for CDE job submission  CDE workload balancing between multiple CDE clusters  If only a single CDE cluster is available, it will queue jobs until compute bandwidth is available    Queue jobs if CDE clusters are too busy  Jobs will re-run if set in the queue  If the number of retry for a job spec is greater than 3 (parameterized), an alert will be triggered    Monitor jobs from start to finish  Alert if job  Fails  Run time out of predetermined max run time   i.e. jobs run for 10 minutes and max run time for jobs is set to 5 minutes       Setup  The following NiFi parameters will be required    api_token (CDE Token, more on this later)   Set to ${cdeToken}    job-runtime-threshold-ms  Max run time a job should run before an alert is triggered    kbrokers  Kafka brokers    ktopic-fail  Kafka topic: cde-job-failures    ktopic-inbound-jobs  Kafka topic: cde-jobs    ktopic-job-monitoring  Kafka topic: cde-job-monitoring    ktopic-job-runtime-over-limit  Kafka topic: cde-job-runtime-alert    ktopic-retry  Kafka topic: cde-retry    username  CDE Machine user    password  CDE machine user password    primary-vc-token-api  CDE token api (more on this later)    primary_vc_jobs_api  CDE Primary cluster jobs api (more on this later)    secondary-vc-available  Y/N  If secondary CDE cluster is available, set to Y, else N    secondary_vc_jobs_api  CDE secondary cluster jobs API if the secondary cluster is available    run_count_limit  Max number of concurrent running jobs per CDE cluster  i.e. 20    wait-count-max  Max retry count. If a job is unable to be submitted to CDE (ie due to be too busy), how many times should NiFi retry before adding job to Kafka ktopic-fail topic  i.e. 5    start_count_limit  Max number of concurrent starting jobs per CDE cluster  i.e. 20     Note: When you run the workflow for the first time, generally the Kafka topics will be automatically created for you.  Detailed WorkFlow  Once a CDE job spec is sent to NiFi, NiFi does the following:   Write job spec to Kafka ktopic-inbound-jobs (nifi parameter) topic  Pull jobs from Kafka ktopic-inbound-jobs (nifi parameter) topic  New jobs- ktopic-inbound-jobs (nifi parameter) topic  retry jobs- ktopic-retry (nifi parameter) topic   Monitoring jobs- ktopic-job-monitoring (nifi parameter) topic     Fetch CDE API tokens  Check if the primary cluster current run count is less than run_count_limit (nifi parameter)  Check if the primary cluster current starting count is less than start_count_limit (nifi parameter)  If run or start counts are not within limit, retry the same logic on the secondary cluster (if available, secondary-vc-available)  If run/start counts are within limit, job spec will be submitted to CDE  If run/start counts are not within limit for primary and secondary CDE and the number of retries is less than wait-count-max (nifi parameter), job spec will be written to a Kafka ktopic-retry topic (nifi parameter)    Monitoring  NiFi will call CDE to determine the current status of Job ID (pulled from ktopic-job-monitoring)  If the job end is successful, nothing more will happen here.  If the job ends with failure, job spec will be written to Kafka ktopic-fail topic  If the job is running and run time is less than job-runtime-threshold-ms  Write job spec to ktopic-job-monitoring  Else send an alert (nifi parameter)       CDE APIs  To get started, CDE primary and secondary (if available) cluster API details are needed in NiFi as parameters:   To fetch the token API, click the pencil icon:      Click on Grafana URL:      The URL will look something like this:            https://service.cde-zzzzzz.moad-aw.aaaaa-aaaa.cloudera.site/grafana/d/sK1XDusZz/kubernetes?orgId=1&refresh=5s         Set the NiFi parameter primary-vc-token-api to the first part of the URL:         service.cde-zzzzzz.moad-aw.aaaaa-aaaa.cloudera.site        Now get the Jobs API for both primary and secondary (if available). For a virtual cluster,   Click the pencil icon      Click Jobs API URL to copy the URL      The jobs URL will look something like this:           https://aaa.cde-aaa.moad-aw.aaa-aaa.cloudera.site/dex/api/v1         Fetch the first part of the URL and set the NiFi parameter primary_vc_jobs_api. Do the same steps for secondary_vc_jobs_api           aaa.cde-aaa.moad-aw.aaa-aaa.cloudera.site        Run a CDE job  Inside of the NiFi workflow, there is a test flow to verify the NiFi CDE jobs pipeline works:      To run the flow, inside of InvokeHTTP, set the URL to one of the NiFi nodes. Run it and if the integration is working successfully; you will see a job running in CDE.  Enjoy! Oh, by the way, I plan on publishing a video walking through the NiFi flow. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-09-2020
	
		
		01:34 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							         Recently I ran into a scenario requiring to connect my Spark Intellij IDE to Kafka DataHub.   I'm not going to claim the status of a pro at IDE secure setup.  Therefore for novices in the security realm alike, they may find this article useful     This article will go through steps setting up an Spark Scala IDE (Intellij)  (with a supplied working code example) to connect securely to a Kafka DataHub over SASL_SSL protocol using PLAIN SASL mechanism.  Artifacts   https://github.com/sunileman/spark-kafka-streaming  Scala Object  https://github.com/sunileman/spark-kafka-streaming/blob/master/src/main/scala/KafkaSecureStreamSimpleLocalExample.scala  The scala object accepts 2 inputs  Target Kafka topic  Kafka broker(s)         Prequequites    Kafka DataHub Instances  Permission setup on Ranger to be able to read/write from Kafka  Intellij (or similar) with the Scala plugin installed  Workload username and password   TrustStore  Andre Sousa Dantas De Araujo did a great job explaining (very simply) how get the certificate from CDP and create a truststore. Just a few simple steps here   https://github.com/asdaraujo/cdp-examples#tls-truststore    I stored it here on my local machine which is referenced in the spark scala code         ./src/main/resources/truststore.jks     JaaS Setup  Create a jaas.conf file     KafkaClient {
org.apache.kafka.common.security.plain.PlainLoginModule required
username="YOUR-WORKLOAD-USER"
password="YOUR-WORKLOAD-PASSWORD";
};     I stored mine here which is referenced in the spark scala code     ./src/main/resources/jaas.conf     Spark Session (Scala Code)   Master is set to local  set spark.driver.extraJavaOptions and spark.executor.extraJavaOptions to the location of your jaas.conf  set spark.kafka.ssl.truststore.location to the location of your truststore          val spark = SparkSession.builder
      .appName("Spark Kafka Secure Structured Streaming Example")
      .master("local")
      .config("spark.kafka.bootstrap.servers", kbrokers)
      .config("spark.kafka.sasl.kerberos.service.name", "kafka")
      .config("spark.kafka.security.protocol", "SASL_SSL")
      .config("kafka.sasl.mechanism", "PLAIN")
      .config("spark.driver.extraJavaOptions", "-Djava.security.auth.login.config=./src/main/resources/jaas.conf")
      .config("spark.executor.extraJavaOptions", "-Djava.security.auth.login.config=./src/main/resources/jaas.conf")
      .config("spark.kafka.ssl.truststore.location", "./src/main/resources/truststore.jks")
      .getOrCreate()     Write to Kafka  The data in the dataframe is hydrated via csv file.  Here I will simply read the dataframe and write it back out to a Kafka topic            val ds = streamingDataFrame.selectExpr("CAST(id AS STRING)", "CAST(text AS STRING) as value")
      .writeStream.format("kafka")
      .outputMode("update")
      .option("kafka.bootstrap.servers", kbrokers)
      .option("topic", ktargettopic)
      .option("kafka.sasl.kerberos.service.name", "kafka")
      .option("kafka.ssl.truststore.location", "./src/main/resources/truststore.jks")
      .option("kafka.security.protocol", "SASL_SSL")
      .option("kafka.sasl.mechanism", "PLAIN")
      .option("checkpointLocation", "/tmp/spark-checkpoint2/")
      .start()
      .awaitTermination()        Run  Supply JVM option, provide the location of the jaas.conf     -Djava.security.auth.login.config=/PATH-TO-YOUR-jaas.conf            Supply the program arguments.  My code takes 2, kafka topic and Kafka broker(s)     sunman my-kafka-broker:9093        That's it! Run it and enjoy secure SparkStreaming+Kafka glory 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-11-2020
	
		
		12:47 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 
    
   
   
 Recently I was engaged in a use case where CDE processing was required to be triggered once data landed on s3. The s3 trigger in AWS would be via a Lambda function. As the files/data land in s3, an AWS Lambda function would be triggered to then call CDE to process the data/files. Lambda functions at trigger time include the names and locations of the files the trigger was executed upon. The file locations/names would be passed onto the CDE engine to pick up and process accordingly. 
 Prerequisites to run this demo 
 
 AWS account 
 s3 Bucket 
 Some knowledge of Lambda 
 CDP and CDE 
 
 Artifacts 
 
 AWS Lambda function code
 
 https://github.com/sunileman/spark-kafka-streaming/blob/master/src/main/awslambda/triggerCDE.py 
 
 
 CDE Spark Job, main class com.cloudera.examples.SimpleCDERun 
 
 Code for class com.cloudera.examples.SimpleCDERun
 
 https://github.com/sunileman/spark-kafka-streaming 
 
 
 Prebuilt jar
 
 https://sunileman.s3.amazonaws.com/CDE/spark-kafka-streaming_2.11-1.0.jar 
 
 
 
 
 
 Processing Steps 
 
 Create a CDE Job (Jar provided above) 
 Create a Lambda function on an s3 bucket (Code provided above)
 
 Trigger on put/post 
 
 
 Load a file or files on s3 (any file) 
 AWS Lambda is triggered by this event which calls CDE.  The call to CDE will include the locations and names of all files the trigger was executed upon 
 CDE will launch, processing the files, and end gracefully 
 
 It's quite simple.   
 Create a CDE Job 
 
 Name: Any Name. I called it testjob 
 Spark Application: Jar file provided above 
 Main Class: com.cloudera.examples.SimpleCDERun  
 
    
   
 Lambda 
 Create an AWS Lambda function to trigger on put/post for s3.  The lambda function code is simple.  It will call CDE for each file posted to s3. Lambda function provided in the artifacts section above.  
   
 The following are the s3 properties: 
    
 Trigger CDE 
 Upload a file to s3.  Lambda will trigger the CDE job.  For example, I uploaded a file test.csv to s3. Once the file was uploaded, Lambda calls CDE to execute a job on that file 
   
 Lambda Log 
 The first arrow shows the file name (test.csv).  The second arrow shows the CDE JobID, which in this case returned the number 14. 
    
   
 In CDE, Job Run ID: 14 
    
   
 In CDE stdout logs show that the job received the location and name of the file which Lambda was triggered upon. 
    
   
 As I said in my last post, CDE is making things super simple.  Enjoy. 
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		08-28-2020
	
		
		09:44 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							    The all new Cloudera Data Engineering Experience     I recently had the opportunity to work with Cloudera Data Engineering to stream data from Kafka.  It's quite interesting how I was able to deploy code without much worry about how to configure the back end components.       Demonstration  This demo will pull from the Twitter API using NiFi, write to payload to a Kafka topic named "twitter".  Spark Streaming on Cloudera Data Engineering Experience CDE will pull from the twitter topic, extract the text field from the payload (which is the tweet itself) and write back to another Kafka topic named "tweet"     The following is an example of a twitter payload.  The objective is to extract only the text field:      What is Cloudera Data Engineering?  Cloudera Data Engineering (CDE) is a serverless service for Cloudera Data Platform that allows you to submit Spark jobs to an auto-scaling cluster. CDE enables you to spend more time on your applications, and less time on infrastructure.  How do I begin with Cloudera Data Engineering (CDE)?  Complete setup instructions here.     Prerequisites   Access to a CDE  Some understanding of Apache Spark  Access to a Kafka cluster  In this demo, I use Cloudera DataHub, Streamings Messaging for rapid deployment of a Kafka Cluster on AWS    An IDE  I use Intellij  I do provide the jar later on in this article    Twitter API developer Access: https://developer.twitter.com/en/portal/dashboard  Setting up a twitter stream  I use Apache NiFi deployed via Cloudera DataHub on AWS     Source Code  I posted all my source code here.  If you're not interested in building the jar, that's fine.  I’ve made the job Jar available here.  Oc t26, 2020 update - I added source code for how to connect CDE to Kafka DH available here.  Users should be able to run the code as is without need for jaas or keytab.   Kafka Setup  This article is focused on Spark Structured Streaming with CDE.  I'll be super brief here   Create two Kafka topics  twitter  This topic is used to ingest the firehose data from twitter API    tweet  This topic is used post tweet extraction performed via Spark Structured streaming       NiFi Setup  This article is focused on Spark Structured Streaming with CDE.  I'll be super brief here.  Use the GetTwitter processor (which requires twitter api developer account, free) and write to the Kafka twitter topic               Spark Code (Scala)   Load up the Spark code on your machine from here: https://github.com/sunileman/spark-kafka-streaming  Fire off a sbt clean and package  A new jar will be available under target: spark-kafka-streaming_2.11-1.0.jar  The jar is available here      What does the code do?  It will pull from the source Kafka topic (twitter), extract the text value from the payload (which is the tweet itself) and write to the target topic (tweet)  CDE   Assuming CDE access is available, navigate to virtual clusters->View Jobs      Click on Create Job:     Job Details   Name  Job Name    Spark Application File  This is the jar created from the sbt package: spark-kafka-streaming_2.11-1.0.jar  Another option is to simply provide the URL where the jar available  https://sunileman.s3.amazonaws.com/CDE/spark-kafka-streaming_2.11-1.0.jar      Main Class   com.cloudera.examples.KafkaStreamExample    Arguments  arg1  Source Kafka topic: twitter    arg2  Target Kafka topic: tweet    arg3  Kafka brokers: kafka1:9092,kafka2:9092,kafka3:9092            From here jobs can be created and run or simply created.  Click on Create and Run to view the job run:    To view the metrics about the streaming:     At this point, only the text (tweet) from the twitter payload is being written to the tweet Kafka topic.     That's it! You now have a spark structure stream running on CDE fully autoscaled.  Enjoy  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-25-2020
	
		
		07:16 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I am trying running nifi on AWS with a ELB fronting it.  I use ListenHTTP (port 7777)and need to do a health check via ELB.  It seems listenhttp only supports HEAD and POST.  ELB only supports health check via GET.  Any ideas how I can accomplish this to determine health status of port 7777 using ELB health checks? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache NiFi
			
    
	
		
		
		08-11-2020
	
		
		01:40 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							   
   Image Courtesy: k9s 
 I recently ran into a scenario where I needed to gather Hive logs on the new Data Warehouse Experience on AWS.   The "old" way of fetching logs was to SSH into the nodes. Data Warehouse Experience is now deployed on K8s, so SSHing is off the table. Therefore a tool like K9s is key. This is a raw article to quickly demonstrate how to use K9s to fetch Data Warehouse Experience logs which are deployed on AWS K8s 
 Prerequisites 
 
 Data Warehouse Experience 
 K9s installed on your machine  
 AWS ARN (instructions provided below) 
 AWS configure (CLI) pointing to your AWS env.  Simply type AWS configure via CLI and point to the correct AWS subscription 
 
 AWS ARN 
 Your AWS ARN is required to successfully connect K9s to CDW(DW-X) 
 
 On AWS, go to IAM > Users > Search for your user name:   
 Click on your username to fetch the ARN:   
 
 Kubeconfig 
 
 Connecting to DW-X using K9s requires kubeconfig. DW-X makes this available under DW-X-> Environments > Your Environment > Show Kubeconfig.  
 Click on the copy option and make the contents available within a file in your machine file system.  For example, I stored the kubeconfig contents here:  /Users/sunile.manjee/.k9s/kubeconfig.yml      
 
   
   
   
   
   
   
   
   
      
 ARN 
 To access K8s from K9s, your ARN will need to be added under Grant Access:   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
      
 K9s 
 Now all is set up to connect to DW-X K8s using K9s.   Reference kubeconfig.yml file when using K9s 
   
   
 k9s --kubeconfig /Users/sunile.manjee/.k9s/kubeconfig.yml 
   
   
    
   
   
   
   
   
   
   
   
   
   
 That's it. From here the logs are made available and a ton of other metrics.  For more information on how to use K9s, see k9scli.io 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		05-04-2020
	
		
		10:38 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 
 The EFM (Edge Flow Manager) makes it super simple to write flows for MiNiFi to execute where ever it may be located (laptops, refineries, phones, OpenShift,etc).  All agents (MiNiFi) are assigned an agentClass.  Once the agent is turned on, it will phone home to EFM for run-time instructions. The run-time instructions are set at the Class level.  Meaning all agents within a class, run the same instruction (flow) set. There can be 0 to many Classes.  In this example, I will capture Windows Security Events via MiNiFi and ship them to NiFi over Site2Site 
   
 
 Download MiNiFi MSI and set the classname.  In this example, I set the classname to test6.  This property is set at install time (MSI) or by going directly into minifi.properties. Also, notice the setting nifi.c2.enable=true.  This informs MiNFi that run time flow instructions will be received from EFM.  Start MiNiFi.     
 MiNiFi can be configured to send data to multi endpoint (ie Kafka, NiFi, EventHub, etc). In this example, data will be sent to NiFi over S2S.  On NiFi create an input port:     
 Capture the port ID. This will be used in EFM later on:     
 On EFM, open class test6.  This is where we design the flow for all agents with their class is set to test6:     
 To capture Windows events via MiNiFi, add ConsumeWindowsEventLog processor to the canvas:     
 Configure the process to pull events.  In this example, MiNiFi will listen for Windows Security Events:     
 To send data from MiNiFi to NiFi, add Remote Process Group to the canvas. Provide a NiFi endpoint:     
 Connect ConsumeWindowsEventLog processor to the Remote Process Group. Provide the NiFi Input Port ID captured earlier:     
 Flow is ready to publish:     
 Click on Publish. MiNiFi will phone home at a set interval (nifi.c2.agent.heartbeat.period). Once that occurs, MiNiFi will receive new run time flow instructions.  At that time data will start flowing into NiFi.      
 
 The EFM makes it super simple to capture Windows events and universally ship anywhere without the ball and chain of most agent/platform designs.   
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		04-13-2020
	
		
		07:54 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @haris_khan  Good observation.  Clustering concept in k8s in general requires a fair bit of understanding how the underlying software behaves during these various scenarios.  Apache NiFi engineers have built a k8s operator which handles scaling up and down.      I believe you may want to seriously look at NiFi stateless or MiNiFi (both on k8s) if rapid scaling up/down is of interest. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-07-2020
	
		
		08:30 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 
    
   
   
 Application deployment has been significantly proliferated by Kubernetes. However, true universal log capture with multi endpoint (downstream) support is lacking. Apache NiFi Stateless provides a possibility to bridge the gap between rapid application deployment and InfoSecs desire to continue to capture and monitor behaviors.   
 What is NiFi Stateless?  
 NiFi-Fn is a library for running NiFi flows as stateless functions. It provides delivery guarantees similar to NiFi, without the need for an on-disk repository, by waiting to confirm receipt of incoming data until it has been written to the destination (source NIFI-5922). 
 Try it out 
 Prerequisites 
 
 K8s (local or cluster).  In this demonstration, Azure Kubernetes Service is used. 
 Some familiarity with K8s & NiFi 
 
 Assets Used 
 
 NiFi on K8s 
 
 https://github.com/sunileman/AKS-YAMLS/blob/master/apache-nifi.yaml 
 Any instance of NiFi will do here.  It does not need to run on K8s. 
 
 NiFi Registry on K8s 
 
 https://github.com/sunileman/AKS-YAMLS/blob/master/nifi-registry.yml 
 Any instance of NiFi Registry will do here. It does not need to run on K8s. 
 
 
 Laying the groundwork 
 NiFi Stateless will pull an existing flow from NiFi Registry.  The following is a simple NiFi flow designed in NiFi: 
    
   
 TailFile processor will tail the application log file /var/log/app.txt.  The application deployed will write log entries to this file: 
    
   
 The flow is checked into NiFi Registry.  NiFi Registry URL, Bucket Identifier & Flow Identifier will be used by NiFi Stateless at run time. More about this soon. 
   
    
   
 Time to deploy 
 The flow has been registered into NiFi Registry, therefore the application pod can be deployed.  A NiFi Stateless container will be deployed in the same application Pod (sidecar) to capture the log data generated from the application.  The application being deployed is simple. It is a dummy application that generates a timestamp log entry every 5 seconds into a log file (/var/log/app.txt).  NiFi stateless will tail this file and ship the events. The event can be shipped virtually anywhere due to NiFi’s inherent universal log forward compatibility. (Kafka/Splunk/ElasticSearch/Mongo/Kinesis/EventHub/S3/ADLS/etc).  All NiFi processors are in https://nifi.apache.org/docs.html. For this demonstration, the log event will be shipped to a NiFi cluster over Site2Site. 
   
 Here is the K8s YAML to deploy the Pod (application with NiFi Stateless sidecar): https://github.com/sunileman/AKS-YAMLS/blob/master/nifi-stateless-sidecar.yml 
   
 In that YAML file, NiFi Registry URL, bucketId, and flowId will need to be updated. These values are from the NiFi registry. NiFi Stateless binds itself at runtime to a specific flow to execute. 
 
   
   
   
     args: ["RunFromRegistry", "Continuous", "--json", "{\"registryUrl\":\"http://nifiregistry-service\",\"bucketId\":\"71efc3ea-fe1d-4307-97ce-589f78be05fb\",\"flowId\":\"c9092508-4deb-45d2-b6a4-b6a4da71db47\"}"] 
   
   
 To deploy the Pod, run the following: 
   
   
 kubectl apply -f nifi-statless-sidecar.yml 
   
   
 Once the pod is up and running, immediately application log events are captured by NiFi Stateless containers and shipped downstream. 
 Wrapping Up 
 FluentD and similar offerings are great for getting started to capture application log data. However, enterprises require much richer connectivity (Universal Log Forward Compatibility) to enable InfoSec to perform their vital role. NiFi Stateless bridges that current gap.    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
 
         
					
				













