About sunile_manjee

sunile_manjee · ‎08-11-2020

Image Courtesy: k9s I recently ran into a scenario where I needed to gather Hive logs on the new Data Warehouse Experience on AWS. The "old" way of fetching logs was to SSH into the nodes. Data Warehouse Experience is now deployed on K8s, so SSHing is off the table. Therefore a tool like K9s is key. This is a raw article to quickly demonstrate how to use K9s to fetch Data Warehouse Experience logs which are deployed on AWS K8s Prerequisites Data Warehouse Experience K9s installed on your machine AWS ARN (instructions provided below) AWS configure (CLI) pointing to your AWS env. Simply type AWS configure via CLI and point to the correct AWS subscription AWS ARN Your AWS ARN is required to successfully connect K9s to CDW(DW-X) On AWS, go to IAM > Users > Search for your user name: Click on your username to fetch the ARN: Kubeconfig Connecting to DW-X using K9s requires kubeconfig. DW-X makes this available under DW-X-> Environments > Your Environment > Show Kubeconfig. Click on the copy option and make the contents available within a file in your machine file system. For example, I stored the kubeconfig contents here: /Users/sunile.manjee/.k9s/kubeconfig.yml ARN To access K8s from K9s, your ARN will need to be added under Grant Access: K9s Now all is set up to connect to DW-X K8s using K9s. Reference kubeconfig.yml file when using K9s k9s --kubeconfig /Users/sunile.manjee/.k9s/kubeconfig.yml That's it. From here the logs are made available and a ton of other metrics. For more information on how to use K9s, see k9scli.io

kettle · ‎05-15-2020

hello! If I insert a string containing 'or "or, PutSQL to Phoenix will be return the grammatical errors, this should be how to solve?

sunile_manjee · ‎05-04-2020

The EFM (Edge Flow Manager) makes it super simple to write flows for MiNiFi to execute where ever it may be located (laptops, refineries, phones, OpenShift,etc). All agents (MiNiFi) are assigned an agentClass. Once the agent is turned on, it will phone home to EFM for run-time instructions. The run-time instructions are set at the Class level. Meaning all agents within a class, run the same instruction (flow) set. There can be 0 to many Classes. In this example, I will capture Windows Security Events via MiNiFi and ship them to NiFi over Site2Site Download MiNiFi MSI and set the classname. In this example, I set the classname to test6. This property is set at install time (MSI) or by going directly into minifi.properties. Also, notice the setting nifi.c2.enable=true. This informs MiNFi that run time flow instructions will be received from EFM. Start MiNiFi. MiNiFi can be configured to send data to multi endpoint (ie Kafka, NiFi, EventHub, etc). In this example, data will be sent to NiFi over S2S. On NiFi create an input port: Capture the port ID. This will be used in EFM later on: On EFM, open class test6. This is where we design the flow for all agents with their class is set to test6: To capture Windows events via MiNiFi, add ConsumeWindowsEventLog processor to the canvas: Configure the process to pull events. In this example, MiNiFi will listen for Windows Security Events: To send data from MiNiFi to NiFi, add Remote Process Group to the canvas. Provide a NiFi endpoint: Connect ConsumeWindowsEventLog processor to the Remote Process Group. Provide the NiFi Input Port ID captured earlier: Flow is ready to publish: Click on Publish. MiNiFi will phone home at a set interval (nifi.c2.agent.heartbeat.period). Once that occurs, MiNiFi will receive new run time flow instructions. At that time data will start flowing into NiFi. The EFM makes it super simple to capture Windows events and universally ship anywhere without the ball and chain of most agent/platform designs.

sunile_manjee · ‎04-07-2020

Application deployment has been significantly proliferated by Kubernetes. However, true universal log capture with multi endpoint (downstream) support is lacking. Apache NiFi Stateless provides a possibility to bridge the gap between rapid application deployment and InfoSecs desire to continue to capture and monitor behaviors. What is NiFi Stateless? NiFi-Fn is a library for running NiFi flows as stateless functions. It provides delivery guarantees similar to NiFi, without the need for an on-disk repository, by waiting to confirm receipt of incoming data until it has been written to the destination (source NIFI-5922). Try it out Prerequisites K8s (local or cluster). In this demonstration, Azure Kubernetes Service is used. Some familiarity with K8s & NiFi Assets Used NiFi on K8s https://github.com/sunileman/AKS-YAMLS/blob/master/apache-nifi.yaml Any instance of NiFi will do here. It does not need to run on K8s. NiFi Registry on K8s https://github.com/sunileman/AKS-YAMLS/blob/master/nifi-registry.yml Any instance of NiFi Registry will do here. It does not need to run on K8s. Laying the groundwork NiFi Stateless will pull an existing flow from NiFi Registry. The following is a simple NiFi flow designed in NiFi: TailFile processor will tail the application log file /var/log/app.txt. The application deployed will write log entries to this file: The flow is checked into NiFi Registry. NiFi Registry URL, Bucket Identifier & Flow Identifier will be used by NiFi Stateless at run time. More about this soon. Time to deploy The flow has been registered into NiFi Registry, therefore the application pod can be deployed. A NiFi Stateless container will be deployed in the same application Pod (sidecar) to capture the log data generated from the application. The application being deployed is simple. It is a dummy application that generates a timestamp log entry every 5 seconds into a log file (/var/log/app.txt). NiFi stateless will tail this file and ship the events. The event can be shipped virtually anywhere due to NiFi’s inherent universal log forward compatibility. (Kafka/Splunk/ElasticSearch/Mongo/Kinesis/EventHub/S3/ADLS/etc). All NiFi processors are in https://nifi.apache.org/docs.html. For this demonstration, the log event will be shipped to a NiFi cluster over Site2Site. Here is the K8s YAML to deploy the Pod (application with NiFi Stateless sidecar): https://github.com/sunileman/AKS-YAMLS/blob/master/nifi-stateless-sidecar.yml In that YAML file, NiFi Registry URL, bucketId, and flowId will need to be updated. These values are from the NiFi registry. NiFi Stateless binds itself at runtime to a specific flow to execute. args: ["RunFromRegistry", "Continuous", "--json", "{\"registryUrl\":\"http://nifiregistry-service\",\"bucketId\":\"71efc3ea-fe1d-4307-97ce-589f78be05fb\",\"flowId\":\"c9092508-4deb-45d2-b6a4-b6a4da71db47\"}"] To deploy the Pod, run the following: kubectl apply -f nifi-statless-sidecar.yml Once the pod is up and running, immediately application log events are captured by NiFi Stateless containers and shipped downstream. Wrapping Up FluentD and similar offerings are great for getting started to capture application log data. However, enterprises require much richer connectivity (Universal Log Forward Compatibility) to enable InfoSec to perform their vital role. NiFi Stateless bridges that current gap.

sunile_manjee · ‎01-15-2020

Over the last fews weeks as customers have started to ramp up their usage of CDP cloud assets (like DataHub, Experiences, etc), I have observed many of the ways they are leveraging on-prem engineering assets (code) in the cloud. The concept of write-once-deploy-anywhere is fundamental to a well designed data strategy. It's NOT a sales pitch. It's a reality for enterprises who have invested in a Modern Data Architecture. However, unlike on-prem where storage and services are tightly coupled, CDP flips that concept on its head. We can now launch services independently and choose only the capabilities we need for the task at hand. For example, streaming use cases typically require NiFi, Kafka, and Spark Streaming. Each of those services would be separate DataHub clusters and scale independently. This article focuses on using PySpark to read from a secured Kafka instance. To be clear, this is one way (not the only way) of using PySpark to read from Kafka. Both (DE & SM) clusters are launched via CDP control plane. CDP DataHub assets used in this article Data Engineering Cluster (DE) Streams Messaging (SM) Launching DataHub DE and SM clusters are well documented here. PreWork Secure Kafka and generate certs/truststore https://docs.cloudera.com/runtime/7.0.2/kafka-securing/topics/kafka-secure-tls.html In this article I refer to the truststore as kafka.client.truststore Get a list of all Kafka brokers and ports For this article I will call the brokers k1.cloudera.com:9093, k2.cloudera.com:9093, k3.cloudera.com:9093 Create a kafka.properties file security.protocol=SASL_SSL sasl.mechanism=PLAIN ssl.truststore.location=/home/sunilemanjee/kafka.client.truststore.jks ssl.truststore.password=password sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="machine-user" password="password; Create a Kafka jaas file called jaas.conf (this can be named whatever you like) KafkaClient { org.apache.kafka.common.security.plain.PlainLoginModule required username="machine-user" password="password"; }; Create a Kafka topic The easiest way to create a Kafka topic is via SMM (Streamings Messaging Manager) which is shipped with Streams Messaging cluster. Click on the SMM URL within DataHub and the click on "Topics" located on the right menu bar. Click on "Add New" to create a new Kafka topic. Enter the topic name "demo", set partitions to 1, and clean up policy to "delete". The demo topic should now be available. Generate Data For PySpark to consume from a secured instance of Kafka, we need the ability to write to Kafka. Here we will use Kafka console. SSH into one of the broker nodes Update k*.cloudera.com:9093 with your broker list and ports Upload kafka.properties (created early) onto this node Update the location of your kafka.properties file After the below command is executed, you can start to write data (messages) to Kafka. We will come back to this in a moment. kafka-console-producer --broker-list k1.cloudera.com:9093, k2.cloudera.com:9093,k3.cloudera.com:9093 --producer.config /home/c sunilemanjee/kafka.properties --topic demo Read from Kafka using PySpark SSH into any node within the DE cluster Uploaded jaas.conf and kafka.client.truststore Update the location of jaas.conf and kafka.client.truststore Launch PySpark shell using the following command pyspark --files "/home/csunilemanjee/jaas.conf#jaas.conf,/home/sunilemanjee/kafka.client.truststore.jks#kafka.client.truststore.jks" --driver-java-options "-Djava.security.auth.login.config=/home/sunilemanjee/jaas.conf" --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=/home/sunilemanjee/jaas.conf" Once PySpark shell is up, it may be easier to store the Kafka brokers in a variable like this: KAFKA_BROKERS = "k1.cloudera.com:9093,k2.cloudera.com:9093,k3.cloudera.com:9093" Create a structured stream to read from Kafka. Update the following kafka.ssl.truststore.location kafka.ssl.truststore.password username password df_kafka = spark.readStream.format("kafka").option("kafka.bootstrap.servers", KAFKA_BROKERS).option("subscribe", "demo").option("kafka.security.protocol", "SASL_SSL").option("kafka.sasl.mechanism", "PLAIN").option("kafka.ssl.truststore.location", "./kafka.client.truststore.jks").option("kafka.ssl.truststore.password", "password").option("kafka.sasl.jaas.config", "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"machine-user\" password=\"password\"serviceName=\"kafka\";").load().selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)").writeStream.format("console").trigger(continuous="1 second").start() To start viewing Kafka message on the console (pyspark shell) from the "demo" topic stream = df_kafka.writeStream.format("console").start() stream.awaitTermination() ##once you are finished, to kill the stream run this stream.stop() Go back to your kafka console and start write messaging (anything you like). You will see those messages show up in your PySpark Shell console. That's it. Again this is one (not the only way) to use PySpark to consume from a secured Kafka instance. I see as an emerging pattern in the CDP for streaming use cases. Enjoy.

sevenmen · ‎11-25-2019

@sunile_manjee Your article is too good and informative. I am searching for Benchmarking Hadoop with TeraGen, TeraSort, and TeraValidate with ease and I get exact article i am thankful to you for sharing this educational article . and the way you written is also good, you covered up all the points which i searching for & I am impressed by reading this article. Keep writing and sharing educational article like this which can help us to grow our knowledge. Regards : Sevenmentor

sunile_manjee · ‎11-18-2019

@Kalyan77 Good question and I haven't tried yet. in the next few weeks I have an engagement which will require me to find out. will keep you posted.

sunile_manjee · ‎08-27-2019

Part 2 of Autoscaling MiNiFi on K8S is focused on deploying the artifacts on AKS - Amazon Kubernetes Service. My knew jerk reaction was all Kubernetes as a Service would play well but that is definitely not the case. Hence why GCP Anthos product direction for this space is a key. The net-net of my observation is k8s app deployment on any single cloud vendor would cause deployment complexities any other k8s deployment, cloud or OnPrem.. Vendor lock in theory is ALIVE and WELL. In Azure I leveraged ACS for EFM and NiFi Registry; however, the natural evolution was to deploy EFM and NiFi Registry (NR) on K8S. EFM, NR, and MiNiFi are integrated components (refer to part 1 on architecture). I will leverage several key out of the box k8s components to make this all work together. The good news is, the deployment is super simple! Prerequisites Some knowledge of Kubernetes and AKS AKS Cluster kubectl cli eksctl cli VPC 2 public subnets within a VPC NR, EFM, and MiNiFi images uploaded to and available ECS Refer to part 1 on image locations Create a AKS Cluster eksctl Makes this simple. I tried using aws eks and it was painful. eksctl create cluster \ --name sunman-k8s \ --version 1.13 \ --nodegroup-name standard-workers \ --node-type t3.medium \ --nodes 3 \ --nodes-min 1 \ --nodes-max 4 \ --vpc-public-subnets=subnet-067d0ffbc09152382,subnet-037d8c6750c5de236 \ --node-ami auto Deployment All the contents in the ymls below can be placed into single file for deployment. For this demonstration, chucking it into smaller components makes it easier to explain. NiFi Registry (NR) Edge Flow Manager has a dependency on NR. Flow versions are stored in NR. Here is nifiregistry.yml. apiVersion: extensions/v1beta1 kind: Deployment metadata: name: nifiregistry spec: replicas: 1 selector: matchLabels: app: nifiregistry template: metadata: labels: app: nifiregistry spec: containers: - name: nifiregistry-container image: your-image-location/nifiregistry ports: - containerPort: 18080 name: http - containerPort: 22 name: ssh resources: requests: cpu: ".5" memory: "2Gi" limits: cpu: "1" env: - name: VERSION value: "11" --- kind: Service #+ apiVersion: v1 #+ metadata: #+ name: nifiregistry-service #+ spec: #+ selector: #+ app: nifiregistry #+ ports: #+ - protocol: TCP #+ targetPort: 18080 #+ port: 80 #+ name: http #+ - protocol: TCP #+ targetPort: 22 #+ port: 22 #+ name: ssh #+ type: LoadBalancer #+ loadBalancerSourceRanges: - 0.0.0.0/0 Update the following line in nifiregistry.yml with the location of your NR image. image: your-image-location/nifiregistry Also take note the load balancer for NR is open to the world. You may want to lock this down. Deploy NR on k8s kubectl apply -f nifiregistry.yml Edge Flow Manager (EFM) Next deploy EFM on k8s. Here is efm.yml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: efm spec: replicas: 1 selector: matchLabels: app: efm template: metadata: labels: app: efm spec: containers: - name: efm-container image: your-image-location/efm ports: - containerPort: 10080 name: http - containerPort: 22 name: ssh resources: requests: cpu: ".5" memory: "2Gi" limits: cpu: "1" env: - name: VERSION value: "11" - name: NIFI_REGISTRY_ENABLED value: "true" - name: NIFI_REGISTRY_BUCKETNAME value: "testbucket" - name: NIFI_REGISTRY value: "<a href="<a href="<a href="<a href="<a href="http://nifiregistry-service.default.svc.cluster.local" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a>" target="_blank"><a href="http://nifiregistry-service.default.svc.cluster.local</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a>>" target="_blank"><a href="<a href="http://nifiregistry-service.default.svc.cluster.local</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a>" target="_blank"><a href="http://nifiregistry-service.default.svc.cluster.local</a</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a</a>>>" target="_blank"><a href="<a href="<a href="http://nifiregistry-service.default.svc.cluster.local</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a>" target="_blank"><a href="http://nifiregistry-service.default.svc.cluster.local</a</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a</a>>" target="_blank"><a href="<a href="http://nifiregistry-service.default.svc.cluster.local</a</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a</a>" target="_blank"><a href="http://nifiregistry-service.default.svc.cluster.local</a</a</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a</a</a>>>>" target="_blank"><a href="<a href="<a href="<a href="http://nifiregistry-service.default.svc.cluster.local</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a>" target="_blank"><a href="http://nifiregistry-service.default.svc.cluster.local</a</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a</a>>" target="_blank"><a href="<a href="http://nifiregistry-service.default.svc.cluster.local</a</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a</a>" target="_blank"><a href="http://nifiregistry-service.default.svc.cluster.local</a</a</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a</a</a>>>" target="_blank"><a href="<a href="<a href="http://nifiregistry-service.default.svc.cluster.local</a</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a</a>" target="_blank"><a href="http://nifiregistry-service.default.svc.cluster.local</a</a</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a</a</a>>" target="_blank"><a href="<a href="http://nifiregistry-service.default.svc.cluster.local</a</a</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a</a</a>" target="_blank"><a href="http://nifiregistry-service.default.svc.cluster.local</a</a</a</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a</a</a</a>>>>>" --- kind: Service #+ apiVersion: v1 #+ metadata: #+ name: efm-service #+ spec: #+ selector: #+ app: efm #+ ports: #+ - protocol: TCP #+ targetPort: 10080 #+ port: 80 #+ name: http #+ - protocol: TCP #+ targetPort: 22 #+ port: 22 #+ name: ssh #+ type: LoadBalancer #+ loadBalancerSourceRanges: - 0.0.0.0/0 Update the following line in efm..yml with the location of your EFM image. image: your-image-location/efm Also take note the load balancer for EFM is open to the world. You may want to lock this down. Deploy EFM on k8s kubectl apply -f efm.yml MiNiFi Lastly, deploy MiNiF on k8s. Here is minifi.yml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: minifi spec: replicas: 1 selector: matchLabels: app: minifi template: metadata: labels: app: minifi spec: containers: - name: minifi-container image: your-image-location/minifi-azure-aws ports: - containerPort: 10080 name: http - containerPort: 6065 name: listenhttp - containerPort: 22 name: ssh resources: requests: cpu: "500m" memory: "1Gi" limits: cpu: "1" env: - name: NIFI_C2_ENABLE value: "true" - name: MINIFI_AGENT_CLASS value: "listenSysLog" - name: NIFI_C2_REST_URL value: "<a href="<a href="<a href="<a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a>" target="_blank"><a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a</a>>" target="_blank"><a href="<a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a</a>" target="_blank"><a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a</a</a>>>" target="_blank"><a href="<a href="<a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a</a>" target="_blank"><a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a</a</a>>" target="_blank"><a href="<a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a</a</a>" target="_blank"><a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a</a</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a</a</a</a>>>>" - name: NIFI_C2_REST_URL_ACK value: "<a href="<a href="<a href="<a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a>" target="_blank"><a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a</a>>" target="_blank"><a href="<a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a</a>" target="_blank"><a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a</a</a>>>" target="_blank"><a href="<a href="<a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a</a>" target="_blank"><a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a</a</a>>" target="_blank"><a href="<a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a</a</a>" target="_blank"><a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a</a</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a</a</a</a>>>>" --- kind: Service #+ apiVersion: v1 #+ metadata: #+ name: minifi-service #+ spec: #+ selector: #+ app: minifi #+ ports: #+ - protocol: TCP #+ targetPort: 10080 #+ port: 10080 #+ name: http #+ - protocol: TCP #+ targetPort: 9877 #+ port: 9877 #+ name: tcpsyslog - protocol: TCP #+ targetPort: 9878 #+ port: 9878 #+ name: udpsyslog - protocol: TCP #+ targetPort: 22 #+ port: 22 #+ name: ssh #+ - protocol: TCP #+ targetPort: 6065 #+ port: 6065 #+ name: listenhttp #+ type: LoadBalancer #+ loadBalancerSourceRanges: - 0.0.0.0/0 Update the following line in minifi..yml with the location of your MiNiFI image. image: your-image-location/minifi-azure-aws Also take note the load balancer for MiNiFi is open to the world. You may want to lock this down. Deploy MiNiFi on k8s kubectl apply -f minifi.yml Thats it! Part 1 of this series demonstrated how to autoscale MiNiFi and those same k8s commands can be used here to scale this out properly. The next part of this series will add the concept of k8s stateful sets and their impact EFM/NR/MiNiFi for a resilient backend persistence layer.

sunile_manjee · ‎08-14-2019

MiNiFi (Java Version) is essentially NiFi with a few differences and hence why it runs so darn well on containers/Kubernetes. The use case is to have a single management console (Edge Flow Manager) to manage 0 + many MiNiFi agents which require autoscaling on Kubernetes based on some arbitrary metrics...for example CPU/RAM threshold. EFM and NiFi Registry are required but don't need autoscaling; therefore, these services will be deployed on Azure Container Service. MiNiFi on the other hand often benefits from autoscaling and hence it will be deployed on Azure Kubernetes Service. Required for this demonstration Azure subscription Container Registry Demo will leverage Azure Container Registry Kubernetes Service Demo will leverage Azure Kubernetes Service Azure CLI The following images need to be stored in Azure Container Registry Edge Flow Manager https://github.com/sunileman/efm1.0.0.0-docker NiFi Registry https://github.com/sunileman/NiFi-Registry-Service MiNiFi (Java) https://github.com/sunileman/CEM1.0-Java-MiNiFi This image will come precooked with Azure/AWS NARs Architecture This is a 10k foot view of the architecture. EFM communicates with MiNiFi agents about the work they need to do. EFM also communicates with NiFi Registry to store/version control flows will get passed to the MiNiFi agents. Deploy NiFi Registry and EFM on Azure Container Service Since EFM and Registry don't really benefit from autoscaling, they both are great fit for Azure container service (Mostly Static installs). ACS will guarantee EFM and NiFi registry are alway up with 1 container instance each. EFM, MiNiFi, and Registry have all been imported into my container registry on azure. Create NiFi Registry on ACS NiFi Registry variables to note --name Name of the nifi registry container --dns-name-label Prefix for the dns on the registry service. This will be used as an input into EFM container environment variable az container create --resource-group sunmanCentralRG --name mynifiregistry --image sunmanregistry.azurecr.io/nifiregistry:latest --dns-name-label nifiregistry --ports 18080 --registry-username ****** --registry-password ****** Create EFM on ACS EFM variables to note --NIFI_REGISTRY should match NiFi registry Container DNS (fully qualified server name) --dns-name-label DNS prefix az container create --resource-group sunmanCentralRG --name efm --image sunmanregistry.azurecr.io/efm:latest --dns-name-label myefm --ports 10080 --registry-username ***** --registry-password **** --environment-variables 'NIFI_REGISTRY_ENABLED'='true' 'NIFI_REGISTRY_BUCKETNAME'='testbucket' 'NIFI_REGISTRY'='http://mynifiregistry.centralus.azurecontainer.io:18080' Create a 'testbucket' on NiFi Registry MiNiFi flows will be designed using EFM and stored in the NiFi Registry bucket 'testbucket'. This bucket name was identified as a variable during EFM container was creation. 'NIFI_REGISTRY_BUCKETNAME'='testbucket' NiFi registry will be available under YourNiFiRegistryDSN:18080/nifi-registry/ . For example http://mynifiregistry=y.centralus.azurecontainer.io:18080/nifi-registry/ Click on "NEW BUCKET", Enter bucket name - testbucket Click create Validate EFM is up EFM UI will be available under http://YourEfmDnsPrefix.centralus.azurecontainer.io:10080/efm/ui for example http://myefm.centralus.azurecontainer.io:10080/efm/ui Run MiNiFi Kubernetes Deployment The easiest way to run a deployment in k8s is to build a manifest file. To learn more about k8s manifest files here. Look for < > in the manifest below, as these are the variables a change prior to your deployment (only a few, super simple). Variable to Note MINIFI_AGENT_CLASS This will be the agent class published to EFM. To learn more about EFM, go here Kubernet Manifest File: apiVersion: extensions/v1beta1 kind: Deployment metadata: name: minifi spec: replicas: 1 selector: matchLabels: app: minifi template: metadata: labels: app: minifi spec: containers: - name: minifi-container image: <Your Containe Registry>/minifi-azure-aws:latest ports: - containerPort: 10080 name: http - containerPort: 6065 name: listenhttp - containerPort: 22 name: ssh resources: requests: cpu: ".05" memory: "1Gi" limits: cpu: "1" env: - name: NIFI_C2_ENABLE value: "true" - name: MINIFI_AGENT_CLASS value: "test" - name: NIFI_C2_REST_URL value: "http://<Your EFM servername>.centralus.azurecontainer.io:10080/efm/api/c2-protocol/heartbeat" - name: NIFI_C2_REST_URL_ACK value: "http://<Your EFM servername>.centralus.azurecontainer.io:10080/efm/api/c2-protocol/acknowledge" --- kind: Service #+ apiVersion: v1 #+ metadata: #+ name: minifi-service #+ spec: #+ selector: #+ app: minifi #+ ports: #+ - protocol: TCP #+ targetPort: 10080 #+ port: 10080 #+ name: http #+ - protocol: TCP #+ targetPort: 22 #+ port: 22 #+ name: ssh #+ - protocol: TCP #+ targetPort: 6065 #+ port: 6065 #+ name: listenhttp #+ type: LoadBalancer #+ Once the manifest file has been updated, store it as minifi.yml (this can be any name). Deploy on k8s using kubectl apply -f minifi.yml output sunile.manjee@hwx:~/Documents/GitHub/AKS-YAMLS(master⚡) » kubectl apply -f minifi.yml deployment.extensions/minifi created service/minifi-service created sunile.manjee@hwx:~/Documents/GitHub/AKS-YAMLS(master⚡) » MiNiFi has been successfully deployed. To verify successful deployment visit EFM. EFM should show the agent class name 'test' matching the class name used in the minifi k8s manifest file. Open the class and design any flow. Here I simply used GenerateFlowFile and terminated success relationship with 3 concurrent threads Click on publish and soon thereafter MiNiFi will be executing the flow. AutoScale MiNiFi At this time a single MiNiFi container/agent is executing flows. I purposefully set MiNiFi CPU allocation (manifest file) to a small number to force the autoscaling. First lets check the number of minifi pods running on k8s Single MiNiFi pod. Lets check if autoscaling is enabled for this deployment To enable autoscaling on k8s: kubectl autoscale deployment minifi --cpu-percent=25 --min=1 --max=3 minifi is the deployment name. If CPU utilization exceeds 25%, the autoscaler increases the pods up to a maximum of 3 instances. A minimum of 1 instances is then defined for the deployment Verify autoscaling is enabled on the minifi deployment Number of minifi pods after autoscaling was enabled (3). Kubernetes added 2 additional MiNiFi pods. Lets kill one of the pods and see what happens Kubernetes immediately launched a new MiNiFi container after a MiNiFi pod was killed. Enjoy AutoScale on MiNiFi!

sunile_manjee · ‎06-21-2019

Building an Apache NiFi processor is super easy. I have seen/read several articles on how to get started by executing maven commands via CLI. This article is geared towards individuals who like to use an IDE (specially IntelliJ) to do the imports instead of running via CLI. On IntelliJ click on create project, Check "Create from archetype", click on "ADD Archetype" and enter the following GroupId: org.apache.nifi ArtifactId: nifi-processor-bundle-archetype Version: <YourVersionOfNifi> and then click "OK" Now your new NiFi Archetype has been created. Select it Enter GroupId, ArtifactId, and Version of your choice A final attribute we need to add is artifactBaseName. This is mandatory. Click on "+" and enter Name: artifactBaseName Value: whatEverYouLike Now a project ready to build a custom processor. Enjoy!

Online	Offline
Last Visited	‎05-25-2022 10:07 AM

Member Since	‎05-30-2018 10:40 PM
Last Visited	‎05-25-2022 10:07 AM
Posts	1,322
Kudos received	713

Cloudera Community

Re: How to check the total storage capacity of the...

How to use K9s to fetch metrics and logs for Cloud...

Re: Reading OpenData JSON and Storing into Phoenix...

How to consume Windows Event Logs via MiNiFi?

Kubernetes Application Log Capture through NiFi St...

CDP DataHub - PySpark Structured Streaming reading...

Re: Benchmarking Hadoop with TeraGen, TeraSort, an...

Re: Deploy a Demo Druid/LLAP Cluster within Minute...

Deploy MiNiFi On AKS - Amazon Kubernetes Service

AutoScaling MiNiFi On Kubernetes

Building a Custom Processor Using IntelliJ