Member since 
    
	
		
		
		05-30-2018
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                1322
            
            
                Posts
            
        
                715
            
            
                Kudos Received
            
        
                148
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 4005 | 08-20-2018 08:26 PM | |
| 1878 | 08-15-2018 01:59 PM | |
| 2336 | 08-13-2018 02:20 PM | |
| 4056 | 07-23-2018 04:37 PM | |
| 4950 | 07-19-2018 12:52 PM | 
			
    
	
		
		
		01-15-2020
	
		
		11:57 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							  Over the last fews weeks as customers have started to ramp up their usage of CDP cloud assets (like DataHub, Experiences, etc), I have observed many of the ways they are leveraging on-prem engineering assets (code) in the cloud.  The concept of write-once-deploy-anywhere is fundamental to a well designed data strategy.  It's NOT a sales pitch.  It's a reality for enterprises who have invested in a Modern Data Architecture.  However, unlike on-prem where storage and services are tightly coupled, CDP flips that concept on its head.  We can now launch services independently and choose only the capabilities we need for the task at hand.  For example, streaming use cases typically require NiFi, Kafka, and Spark Streaming.  Each of those services would be separate DataHub clusters and scale independently.  This article focuses on using PySpark to read from a secured Kafka instance.  To be clear, this is one way (not the only way) of using PySpark to read from Kafka.  Both (DE & SM) clusters are launched via CDP control plane.       CDP DataHub assets used in this article   Data Engineering Cluster (DE)  Streams Messaging (SM)      Launching DataHub DE and SM clusters are well documented here.     PreWork   Secure Kafka and generate certs/truststore  https://docs.cloudera.com/runtime/7.0.2/kafka-securing/topics/kafka-secure-tls.html  In this article I refer to the truststore as kafka.client.truststore    Get a list of all Kafka brokers and ports  For this article I will call the brokers k1.cloudera.com:9093, k2.cloudera.com:9093, k3.cloudera.com:9093    Create a kafka.properties file             security.protocol=SASL_SSL
sasl.mechanism=PLAIN
ssl.truststore.location=/home/sunilemanjee/kafka.client.truststore.jks
ssl.truststore.password=password
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="machine-user" password="password;            Create a Kafka jaas file called jaas.conf (this can be named whatever you like)            KafkaClient {
org.apache.kafka.common.security.plain.PlainLoginModule required
username="machine-user"
password="password";
};              Create a Kafka topic  The easiest way to create a Kafka topic is via SMM (Streamings Messaging Manager) which is shipped with Streams Messaging cluster.  Click on the SMM URL within DataHub and the click on "Topics" located on the right menu bar.   Click on "Add New" to create a new Kafka topic.         Enter the topic name "demo", set partitions to 1, and clean up policy to "delete".           The demo topic should now be available.     Generate Data  For PySpark to consume from a secured instance of Kafka, we need the ability to write to Kafka.  Here we will use Kafka console.    SSH into one of the broker nodes  Update k*.cloudera.com:9093 with your broker list and ports  Upload kafka.properties (created early) onto this node  Update the location of your kafka.properties file   After the below command is executed, you can start to write data (messages) to Kafka.  We will come back to this in a moment.           kafka-console-producer --broker-list k1.cloudera.com:9093, k2.cloudera.com:9093,k3.cloudera.com:9093 --producer.config /home/c
sunilemanjee/kafka.properties --topic demo              Read from Kafka using PySpark   SSH into any node within the DE cluster  Uploaded jaas.conf and kafka.client.truststore  Update the location of jaas.conf and kafka.client.truststore  Launch PySpark shell using the following command               pyspark --files "/home/csunilemanjee/jaas.conf#jaas.conf,/home/sunilemanjee/kafka.client.truststore.jks#kafka.client.truststore.jks" --driver-java-options "-Djava.security.auth.login.config=/home/sunilemanjee/jaas.conf" --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=/home/sunilemanjee/jaas.conf"              Once PySpark shell is up, it may be easier to store the Kafka brokers in a variable like this:           KAFKA_BROKERS = "k1.cloudera.com:9093,k2.cloudera.com:9093,k3.cloudera.com:9093"           Create a  structured stream to read from Kafka.  Update the following   kafka.ssl.truststore.location  kafka.ssl.truststore.password  username  password            df_kafka = spark.readStream.format("kafka").option("kafka.bootstrap.servers", KAFKA_BROKERS).option("subscribe", "demo").option("kafka.security.protocol", "SASL_SSL").option("kafka.sasl.mechanism", "PLAIN").option("kafka.ssl.truststore.location", "./kafka.client.truststore.jks").option("kafka.ssl.truststore.password", "password").option("kafka.sasl.jaas.config", "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"machine-user\" password=\"password\"serviceName=\"kafka\";").load().selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)").writeStream.format("console").trigger(continuous="1 second").start()           To start viewing Kafka message on the console (pyspark shell) from the "demo" topic           stream = df_kafka.writeStream.format("console").start()
stream.awaitTermination()
##once you are finished, to kill the stream run this
stream.stop()           Go back to your kafka console and start write messaging (anything you like).  You will see those messages show up in your PySpark Shell console.     That's it.  Again this is one (not the only way) to use PySpark to consume from a secured Kafka instance.  I see as an emerging pattern in the CDP for streaming use cases.  Enjoy.       
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		11-18-2019
	
		
		08:29 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Kalyan77  Good question and I haven't tried yet.  in the next few weeks I have an engagement which will require me to find out. will keep you posted. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-04-2019
	
		
		11:47 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							              My Kubernetes series  (Part1, Part2) was strictly focused on MiNiFi on K8S.  NiFi and MiNiFi may communicate over Site 2 Site; however, often the pattern is to leverage Kafka for a clean message handoff.  NiFi within these patterns is generally the central router and transformer of messages.  Think of it like "FedEx" for data.   Till now most have deployed NiFi on bare metal or VMs.  Natural evolution kicks in. Deploy NiFi on k8s and yes it's super simple.  In this article I will demonstrate how to deploy both a NiFi and ZooKeeper cluster (none being a single pod!) on Azure Kubernetes Service (AKS); however, all artifacts may be leveraged to deploy on virtually any kubernetes offering.     Prerequisites   Some knowledge of Kubernetes and NiFi  AKS / k8s Cluster  kubectl cli  NiFi image available in a registry (ie dockerhub)      Deployment  All the contents in the ymls below can be placed into single file for deployment.  For this demonstration, chucking it into smaller components makes it easier to explain.     I have loaded a NiFi image into Azure Container Repository.  You can use the NiFi image available here in DockerHub.        ZooKeeper  NiFi uses ZooKeeper for several state management functions.  More on that here.  ZooKeeper for NiFi can be deployed using embedded or stand alone mode.  Here I will deploy 3 pods of ZK on k8s.   Deploying ZK on k8s is super simple.      kubectl apply -f https://k8s.io/examples/application/zookeeper/zookeeper.yaml      After a few minutes 3 pods of ZK will be available for NiFi to use.  Once ZK pods become available, proceed  to deploy NiFi on k8s      kubectl get pods -w -l app=zk      NiFi  Below is the k8s deployment yml for NiFi (named nifi.yml).  Few fields to highlight   replicas  This will be number of NiFi pods to deploy (Cluster)    Image  Replace with your image location                    apiVersion: extensions/v1beta1
kind: Service             #+
apiVersion: v1            #+
metadata:                 #+
  name: nifi-service     #+
spec:                     #+
  selector:               #+
    app: nifi            #+
  ports:                  #+
  - protocol: TCP         #+
    targetPort: 8080     #+
    port: 8080              #+
    name: ui            #+
  - protocol: TCP         #+
    targetPort: 9088     #+
    port: 9088              #+
    name: node-protocol-port            #+
  - protocol: TCP         #+
    targetPort: 8888     #+
    port: 8888              #+
    name: s2s            #+
  type: LoadBalancer      #+
---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
  name: nifi
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nifi
  template:
    metadata:
      labels:
        app: nifi
    spec:
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      containers:
      - name: nifi-container
        image: sunmanregistry.azurecr.io/nifi:latest
        ports:
        - containerPort: 8080
          name: http
        - containerPort: 22
          name: ssh
        resources:
          requests:
            cpu: ".5"
            memory: "6Gi"
          limits:
            cpu: "1"
        env:
        - name: VERSION
          value: "1.9"
        - name: NIFI_CLUSTER_IS_NODE
          value: "true"
        - name: NIFI_CLUSTER_NODE_PROTOCOL_PORT
          value: "9088"
        - name: NIFI_ELECTION_MAX_CANDIDATES
          value: "1"
        - name: NIFI_ZK_CONNECT_STRING
          value: "zk-0.zk-hs.default.svc.cluster.local:2181,zk-1.zk-hs.default.svc.cluster.local:2181,zk-2.zk-hs.default.svc.cluster.local:2181"                    To deploy this NiFi k8s manifest:           kubectl apply -f nifi.yml                 Few minutes later you will have a fully functioning NiFi cluster on Kubernetes               I use super simple a lot.  I know.  Deploying MiNiFi & NiFi on k8s is just that, super simple.  Enjoy.       
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		08-27-2019
	
		
		09:06 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							         Part 2 of Autoscaling MiNiFi on K8S is focused on deploying the artifacts on AKS - Amazon Kubernetes Service.  My knew jerk reaction was all Kubernetes as a Service would play well but that is definitely not the case.  Hence why GCP Anthos product direction for this space is a key. The net-net of my observation is k8s app deployment on any single cloud vendor would cause deployment complexities any other k8s deployment, cloud or OnPrem..  Vendor lock in theory is ALIVE and WELL.  In Azure I leveraged ACS for EFM and NiFi Registry; however, the natural evolution was to deploy EFM and NiFi Registry (NR) on K8S.  EFM, NR, and MiNiFi are integrated components (refer to part 1 on architecture).  I will leverage several key out of the box k8s components to make this all work together.  The good news is, the deployment is super simple!     Prerequisites   Some knowledge of Kubernetes and AKS  AKS Cluster  kubectl cli  eksctl cli  VPC  2 public subnets within a VPC  NR, EFM, and MiNiFi images uploaded to and available ECS  Refer to part 1 on image locations        Create a AKS Cluster  eksctl Makes this simple.  I tried using aws eks and it was painful.           eksctl create cluster \
--name sunman-k8s \
--version 1.13 \
--nodegroup-name standard-workers \
--node-type t3.medium \
--nodes 3 \
--nodes-min 1 \
--nodes-max 4 \
--vpc-public-subnets=subnet-067d0ffbc09152382,subnet-037d8c6750c5de236 \
--node-ami auto           Deployment  All the contents in the ymls below can be placed into single file for deployment.  For this demonstration, chucking it into smaller components makes it easier to explain.  NiFi Registry (NR)  Edge Flow Manager has a dependency on NR.  Flow versions are stored in NR.  Here is nifiregistry.yml.          apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nifiregistry
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nifiregistry
  template:
    metadata:
      labels:
        app: nifiregistry
    spec:
      containers:
      - name: nifiregistry-container
        image: your-image-location/nifiregistry
        ports:
        - containerPort: 18080
          name: http
        - containerPort: 22
          name: ssh
        resources:
          requests:
            cpu: ".5"
            memory: "2Gi"
          limits:
            cpu: "1"
        env:
        - name: VERSION
          value: "11"
---
kind: Service             #+
apiVersion: v1            #+
metadata:                 #+
  name: nifiregistry-service     #+
spec:                     #+
  selector:               #+
    app: nifiregistry            #+
  ports:                  #+
  - protocol: TCP         #+
    targetPort: 18080     #+
    port: 80           #+
    name: http            #+
  - protocol: TCP         #+
    targetPort: 22        #+
    port: 22              #+
    name: ssh             #+
  type: LoadBalancer      #+
  loadBalancerSourceRanges:
  - 0.0.0.0/0           Update the following line in nifiregistry.yml with the location of your NR image.        image: your-image-location/nifiregistry        Also take note the load balancer for NR is open to the world.  You may want to lock this down.     Deploy NR on k8s        kubectl apply -f nifiregistry.yml           Edge Flow Manager (EFM)  Next deploy EFM on k8s.   Here is efm.yml        apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: efm
spec:
  replicas: 1
  selector:
    matchLabels:
      app: efm
  template:
    metadata:
      labels:
        app: efm
    spec:
      containers:
      - name: efm-container
        image: your-image-location/efm
        ports:
        - containerPort: 10080
          name: http
        - containerPort: 22
          name: ssh
        resources:
          requests:
            cpu: ".5"
            memory: "2Gi"
          limits:
            cpu: "1"
        env:
        - name: VERSION
          value: "11"
        - name: NIFI_REGISTRY_ENABLED
          value: "true"
        - name: NIFI_REGISTRY_BUCKETNAME
          value: "testbucket"
        - name: NIFI_REGISTRY
          value: "<a href="<a href="<a href="<a href="<a href="http://nifiregistry-service.default.svc.cluster.local" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a>" target="_blank"><a href="http://nifiregistry-service.default.svc.cluster.local</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a>>" target="_blank"><a href="<a href="http://nifiregistry-service.default.svc.cluster.local</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a>" target="_blank"><a href="http://nifiregistry-service.default.svc.cluster.local</a</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a</a>>>" target="_blank"><a href="<a href="<a href="http://nifiregistry-service.default.svc.cluster.local</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a>" target="_blank"><a href="http://nifiregistry-service.default.svc.cluster.local</a</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a</a>>" target="_blank"><a href="<a href="http://nifiregistry-service.default.svc.cluster.local</a</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a</a>" target="_blank"><a href="http://nifiregistry-service.default.svc.cluster.local</a</a</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a</a</a>>>>" target="_blank"><a href="<a href="<a href="<a href="http://nifiregistry-service.default.svc.cluster.local</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a>" target="_blank"><a href="http://nifiregistry-service.default.svc.cluster.local</a</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a</a>>" target="_blank"><a href="<a href="http://nifiregistry-service.default.svc.cluster.local</a</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a</a>" target="_blank"><a href="http://nifiregistry-service.default.svc.cluster.local</a</a</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a</a</a>>>" target="_blank"><a href="<a href="<a href="http://nifiregistry-service.default.svc.cluster.local</a</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a</a>" target="_blank"><a href="http://nifiregistry-service.default.svc.cluster.local</a</a</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a</a</a>>" target="_blank"><a href="<a href="http://nifiregistry-service.default.svc.cluster.local</a</a</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a</a</a>" target="_blank"><a href="http://nifiregistry-service.default.svc.cluster.local</a</a</a</a" target="_blank">http://nifiregistry-service.default.svc.cluster.local</a</a</a</a</a>>>>>"
---
kind: Service             #+
apiVersion: v1            #+
metadata:                 #+
  name: efm-service     #+
spec:                     #+
  selector:               #+
    app: efm            #+
  ports:                  #+
  - protocol: TCP         #+
    targetPort: 10080     #+
    port: 80           #+
    name: http            #+
  - protocol: TCP         #+
    targetPort: 22        #+
    port: 22              #+
    name: ssh             #+
  type: LoadBalancer      #+
  loadBalancerSourceRanges:
  - 0.0.0.0/0           Update the following line in efm..yml with the location of your EFM image.        image: your-image-location/efm        Also take note the load balancer for EFM is open to the world.  You may want to lock this down.     Deploy EFM on k8s        kubectl apply -f efm.yml              MiNiFi  Lastly,  deploy MiNiF on k8s.   Here is minifi.yml        apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: minifi
spec:
  replicas: 1
  selector:
    matchLabels:
      app: minifi
  template:
    metadata:
      labels:
        app: minifi
    spec:
      containers:
      - name: minifi-container
        image: your-image-location/minifi-azure-aws
        ports:
        - containerPort: 10080
          name: http
        - containerPort: 6065
          name: listenhttp  
        - containerPort: 22
          name: ssh
        resources:
          requests:
            cpu: "500m"
            memory: "1Gi"
          limits:
            cpu: "1"
        env:
        - name: NIFI_C2_ENABLE
          value: "true"
        - name: MINIFI_AGENT_CLASS
          value: "listenSysLog"
        - name: NIFI_C2_REST_URL
          value: "<a href="<a href="<a href="<a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a>" target="_blank"><a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a</a>>" target="_blank"><a href="<a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a</a>" target="_blank"><a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a</a</a>>>" target="_blank"><a href="<a href="<a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a</a>" target="_blank"><a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a</a</a>>" target="_blank"><a href="<a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a</a</a>" target="_blank"><a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a</a</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/heartbeat</a</a</a</a>>>>"
        - name: NIFI_C2_REST_URL_ACK
          value: "<a href="<a href="<a href="<a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a>" target="_blank"><a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a</a>>" target="_blank"><a href="<a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a</a>" target="_blank"><a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a</a</a>>>" target="_blank"><a href="<a href="<a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a</a>" target="_blank"><a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a</a</a>>" target="_blank"><a href="<a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a</a</a>" target="_blank"><a href="http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a</a</a" target="_blank">http://efm-service.default.svc.cluster.local/efm/api/c2-protocol/acknowledge</a</a</a</a>>>>"
---
kind: Service             #+
apiVersion: v1            #+
metadata:                 #+
  name: minifi-service     #+
spec:                     #+
  selector:               #+
    app: minifi            #+
  ports:                  #+
  - protocol: TCP         #+
    targetPort: 10080     #+
    port: 10080              #+
    name: http            #+
  - protocol: TCP         #+
    targetPort: 9877     #+
    port: 9877              #+
    name: tcpsyslog    
  - protocol: TCP         #+
    targetPort: 9878     #+
    port: 9878             #+
    name: udpsyslog   
  - protocol: TCP         #+
    targetPort: 22        #+
    port: 22              #+
    name: ssh             #+
  - protocol: TCP         #+
    targetPort: 6065        #+
    port: 6065              #+
    name: listenhttp             #+
  type: LoadBalancer      #+
  loadBalancerSourceRanges:
  - 0.0.0.0/0              Update the following line in minifi..yml with the location of your MiNiFI image.        image: your-image-location/minifi-azure-aws        Also take note the load balancer for MiNiFi is open to the world.  You may want to lock this down.     Deploy MiNiFi on k8s        kubectl apply -f minifi.yml              Thats it!  Part 1 of this series demonstrated how to autoscale MiNiFi and those same k8s commands can be used here to scale this out properly.  The next part of this series will add the concept of k8s stateful sets and their impact EFM/NR/MiNiFi for a resilient backend persistence layer.    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-14-2019
	
		
		06:45 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							             MiNiFi (Java Version) is essentially NiFi with a few differences and hence why it runs so darn well on containers/Kubernetes. The use case is to have a single management console (Edge Flow Manager) to manage 0 + many MiNiFi agents which require autoscaling on Kubernetes based on some arbitrary metrics...for example CPU/RAM threshold. EFM and NiFi Registry are required but don't need autoscaling; therefore, these services will be deployed on Azure Container Service. MiNiFi on the other hand often benefits from autoscaling and hence it will be deployed on Azure Kubernetes Service.      Required for this demonstration   Azure subscription  Container Registry  Demo will leverage Azure Container Registry    Kubernetes Service  Demo will leverage Azure Kubernetes Service    Azure CLI  The following images need to be stored in Azure Container Registry  Edge Flow Manager  https://github.com/sunileman/efm1.0.0.0-docker    NiFi Registry  https://github.com/sunileman/NiFi-Registry-Service    MiNiFi (Java)  https://github.com/sunileman/CEM1.0-Java-MiNiFi  This image will come precooked with Azure/AWS NARs           Architecture          This is a 10k foot view of the architecture. EFM communicates with MiNiFi agents about the work they need to do. EFM also communicates with NiFi Registry to store/version control flows will get passed to the MiNiFi agents.      Deploy NiFi Registry and EFM on Azure Container Service        Since EFM and Registry don't really benefit from autoscaling, they both are great fit for Azure container service (Mostly Static installs). ACS will guarantee EFM and NiFi registry are alway up with 1 container instance each. EFM, MiNiFi, and Registry have all been imported into my container registry on azure.  Create NiFi Registry on ACS  NiFi Registry variables to note   --name  Name of the nifi registry container    --dns-name-label  Prefix for the dns on the registry service. This will be used as an input into EFM container environment variable     az container create --resource-group sunmanCentralRG --name mynifiregistry --image sunmanregistry.azurecr.io/nifiregistry:latest --dns-name-label nifiregistry --ports 18080 --registry-username ****** --registry-password ******      Create EFM on ACS  EFM variables to note   --NIFI_REGISTRY should match NiFi registry Container DNS (fully qualified server name)  --dns-name-label  DNS prefix     az container create --resource-group sunmanCentralRG --name efm --image sunmanregistry.azurecr.io/efm:latest --dns-name-label myefm --ports 10080 --registry-username ***** --registry-password **** --environment-variables 'NIFI_REGISTRY_ENABLED'='true' 'NIFI_REGISTRY_BUCKETNAME'='testbucket' 'NIFI_REGISTRY'='http://mynifiregistry.centralus.azurecontainer.io:18080'      Create a 'testbucket' on NiFi Registry  MiNiFi flows will be designed using EFM and stored in the NiFi Registry bucket 'testbucket'. This bucket name was identified as a variable during EFM container was creation.  'NIFI_REGISTRY_BUCKETNAME'='testbucket'  NiFi registry will be available under  YourNiFiRegistryDSN:18080/nifi-registry/ . For example http://mynifiregistry=y.centralus.azurecontainer.io:18080/nifi-registry/       Click on "NEW BUCKET",  Enter bucket name - testbucket  Click create           Validate EFM is up  EFM UI will be available under  http://YourEfmDnsPrefix.centralus.azurecontainer.io:10080/efm/ui for example http://myefm.centralus.azurecontainer.io:10080/efm/ui      Run MiNiFi Kubernetes Deployment  The easiest way to run a deployment in k8s is to build a manifest file. To learn more about k8s manifest files here. Look for < > in the manifest below, as these are the variables a change prior to your deployment (only a few, super simple).  Variable to Note   MINIFI_AGENT_CLASS  This will be the agent class published to EFM. To learn more about EFM, go here     Kubernet Manifest File:  apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: minifi
spec:
  replicas: 1
  selector:
    matchLabels:
      app: minifi
  template:
    metadata:
      labels:
        app: minifi
    spec:
      containers:
      - name: minifi-container
        image: <Your Containe Registry>/minifi-azure-aws:latest
        ports:
        - containerPort: 10080
          name: http
        - containerPort: 6065
          name: listenhttp  
        - containerPort: 22
          name: ssh
        resources:
          requests:
            cpu: ".05"
            memory: "1Gi"
          limits:
            cpu: "1"
        env:
        - name: NIFI_C2_ENABLE
          value: "true"
        - name: MINIFI_AGENT_CLASS
          value: "test"
        - name: NIFI_C2_REST_URL
          value: "http://<Your EFM servername>.centralus.azurecontainer.io:10080/efm/api/c2-protocol/heartbeat"
        - name: NIFI_C2_REST_URL_ACK
          value: "http://<Your EFM servername>.centralus.azurecontainer.io:10080/efm/api/c2-protocol/acknowledge"
---
kind: Service             #+
apiVersion: v1            #+
metadata:                 #+
  name: minifi-service     #+
spec:                     #+
  selector:               #+
    app: minifi            #+
  ports:                  #+
  - protocol: TCP         #+
    targetPort: 10080     #+
    port: 10080              #+
    name: http            #+
  - protocol: TCP         #+
    targetPort: 22        #+
    port: 22              #+
    name: ssh             #+
  - protocol: TCP         #+
    targetPort: 6065        #+
    port: 6065              #+
    name: listenhttp             #+
  type: LoadBalancer      #+      Once the manifest file has been updated, store it as minifi.yml (this can be any name). Deploy on k8s using  kubectl apply -f minifi.yml  output  sunile.manjee@hwx:~/Documents/GitHub/AKS-YAMLS(master⚡) » kubectl apply -f minifi.yml
deployment.extensions/minifi created
service/minifi-service created
sunile.manjee@hwx:~/Documents/GitHub/AKS-YAMLS(master⚡) »      MiNiFi has been successfully deployed. To verify successful deployment visit EFM. EFM should show the agent class name 'test' matching the class name used in the minifi k8s manifest file.          Open the class and design any flow. Here I simply used GenerateFlowFile and terminated success relationship with 3 concurrent threads          Click on publish and soon thereafter MiNiFi will be executing the flow.      AutoScale MiNiFi  At this time a single MiNiFi container/agent is executing flows. I purposefully set MiNiFi CPU allocation (manifest file) to a small number to force the autoscaling.  First lets check the number of minifi pods running on k8s      Single MiNiFi pod. Lets check if autoscaling is enabled for this deployment          To enable autoscaling on k8s:  kubectl autoscale deployment minifi --cpu-percent=25 --min=1 --max=3  minifi is the deployment name. If CPU utilization exceeds 25%, the autoscaler increases the pods up to a maximum of 3 instances. A minimum of 1 instances is then defined for the deployment      Verify autoscaling is enabled on the minifi deployment          Number of minifi pods after autoscaling was enabled (3).    Kubernetes added 2 additional MiNiFi pods. Lets kill one of the pods and see what happens      Kubernetes immediately launched a new MiNiFi container after a MiNiFi pod was killed.  Enjoy AutoScale on MiNiFi!       
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		06-21-2019
	
		
		08:02 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Building an Apache NiFi processor is super easy. I have seen/read several articles on how to get started by executing maven commands via CLI. This article is geared towards individuals who like to use an IDE (specially IntelliJ) to do the imports instead of running via CLI.      On IntelliJ click on create project, Check "Create from archetype", click on "ADD Archetype" and enter the following  GroupId: org.apache.nifi  ArtifactId: nifi-processor-bundle-archetype  Version: <YourVersionOfNifi>  and then click "OK"          Now your new NiFi Archetype has been created. Select it              Enter GroupId, ArtifactId, and Version of your choice        A final attribute we need to add is artifactBaseName. This is mandatory. Click on "+" and enter  Name: artifactBaseName  Value: whatEverYouLike          Now a project ready to build a custom processor.          Enjoy!     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		12-11-2018
	
		
		09:47 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		6 Kudos
		
	
				
		
	
		
					
							     I came across an article on how to setup NiFi to write into ADLS which required cobbling together various integration pieces and launching HDI. Since then there have been many updates in NiFi enabling a much easier integration. Combined with CloudBreak's rapid deployment of a HDF clusters provides an incredible ease of user experience.  ADLS is Azure's native cloud storage (Look and feel of HDFS) and the capability to read/write via NiFi is key.  This article will demonstrate how use use a CloudBreak Recipe to rapidly deploy a HDF NiFI "ADLS Enabled" cluster.  Assumptions   A CloudBreak instance is available  Azure Credentials available  Moderate familiarity with Azure  Using HDF 3.2+   From Azure you will need:   ADLS url  Application ID  Application Password  Directory ID   NiFi requires ADLS jars, core-site.xml, and hdfs-site.xml. The recipe I built will fetch these resources for you. Simply download the recipe/script from:  https://s3-us-west-2.amazonaws.com/sunileman1/scripts/setAdlsEnv.sh  Open it and scroll all the way to the bottom       Update the following:  Your_ADLS_URL: with your adls url
Your_APP_ID: with your application ID
Your_APP_Password: with your application password
Your_Directory_ID: with your directory id      Once the updates are completed, simply add the script under CloudBreak Recipes.  Make sure to select "post-cluster-install"      Begin provisioning a HDF cluster via CloudBreak.  Once the Recipes page is shown, add the recipe to run on the NiFi nodes.       Once cluster is up use the PutHDFS processor to write to ADLS.  Configure PutHDFS Properties  Hadoop Configuration Resources: /home/nifi/sities/core-site.xml,/home/nifi/sites/hdfs-sites.xml
Additional Classpath Resources: /home/nifi/adlsjars
Directory: /  The above resources are all available on each node due to the recipe.  All you have to do is call the location of the resources in the PutHDFS processor.      That's it!  Enjoy 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		12-11-2018
	
		
		09:43 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							     I came across an article on how to setup NiFi to write into ADLS which required users to cobble together various integration pieces and launching HDI. Since then there have been many updates in NiFi enabling a much easier integration. Combine with CloudBreak's rapid deployment of a HDF cluster provides incredible ease of use.  ADLS is native cloud storage provided by Azure (Look and feel of HDFS) and the capabilities to read/write via NiFi is key.  This article will demonstrate how use use CloudBreak to rapidly deploy a HDF NiFI "ADLS Enabled" cluster. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		11-15-2018
	
		
		08:05 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		9 Kudos
		
	
				
		
	
		
					
							     Objective of this article is to demonstrate how to rapidly deploy a demo Druid & LLAP cluster preloaded with 20 years (nearly 113 million records) of airline data ready for analytics using CloudBreak on any IaaS.  Entire deployment is UI driven without the need for a large overhead of administration.  All artifacts mentioned in this article are publicly available for reuse to try on your own          Prolegomenon  Time series is an incredible capability highly leveraged within the IoT space. Current solution sets offer non scalable & expensive or distributed processing engines lacking low latency OLAP speeds.  Druid is an OLAP time series engine backed by a Lambda architecture.  Druid out of the box SQL capabilities are severely limited and without join support.  Layering HiveQL over Druid brings the best of both worlds. Hive 3 also offers HiveQL over Kafka essentially making Hive a true SQL federation engine. With Druid’s native integration with Kafka, streaming data from Kafka directly into Druid while executing real time SQL queries via HiveQL offers a comprehensive Time Series solution for the IoT space.    On with the demo....  To begin the demonstration, launch a CloudBreak deployer instance on any IaaS or on prem VM.  Quick start makes this super simple.  Launching CloudBreak deployer is well documented here.  Once the CloudBreak deployer is up, add your Azure, AWS, GCP, or OpenStack credentials within the CloudBreak UI.  This will allow deployment of the same cluster on any IaaS.  Druid Blue Print   To launch a Druid/LLAP cluster, an Ambari blue print will be required.  Click On Blueprints        Click on CREATE BLUEPRINT        Name the blue print and enter the following url to import it into CloudBreak   https://s3-us-west-2.amazonaws.com/sunileman1/ambari-blueprints/hdp3/druid+llap+Ambari+Blueprint      Recipes  
 Druid Requires a MetaStore.  Import the recipe to create the MetaStore into CloudBreak  Under Cluster Extensions, click on Recipes       
 Enter a name for the recipe and select "pre-ambari-start" to run this recipe prior to Ambari starting  Under URL enter the following to import the recipe into CloudBreak   https://s3-us-west-2.amazonaws.com/sunileman1/scripts/druid+metastore+install.sh       This cluster will come preloaded with  20 years of airline data  Enter a name for recipe and select "post-cluster-install to run this recipe once HPD services are up  Under URL enter the following to import the recipe into CloudBreak   https://s3-us-west-2.amazonaws.com/sunileman1/scripts/airline-data.sh  
      Create a Cluster  
 Now that all recipes are in place, next step is to create a cluster        Select a IaaS to deploy on (credential)  Enter Cluster Name  Select HDP 3.0  Select cluster type: Druid LLAP HPD 3
 
 This is the new blue print which was imported in previous steps         
 Select an image.  Base image will do for most deployments       
 Select instance types
 
 Note - I used 64 GB of ram per node.  Additionally, I added 3 compute nodes.          
 Select network to deploy the cluster on.  If one is not pre-created, CloudBreak will create one       
 CloudBreak can be configured to use S3, ADLS, WASB, GCS  
 
 Configuring CloudBreak for S3 here  Configuring CloudBreak for ADLS here  Configuring CloudBreak for WASB here  Configuring CloudBreak for GCS          On the Worker node attach get-airline-data recipe  On the Druid Broker node attach druid-metastore-install        External MetaStores (databases) can be bootstrapped to the cluster.  This demo does not require it.       
 Knox will not be used for this demo       
 Attach a security group to each host group.  If a SG is not pre-created, CloudBreak will create one       
 Lastly, provide an Ambari password and ssh key       
 Cluster deployment and provision will begun.  Within the next few minutes a cluster will be ready and an Ambari URL will be available.       
 Zeppelin NoteBook can be imported using the url below.     https://s3-us-west-2.amazonaws.com/sunileman1/Zeppelin-NoteBooks/airline_druid.json      Here I demonstrated how to rapidly launch a Druid/LLAP cluster preloaded with airline data using CloudBreak. Enjoy Druid, it's crazy fast.  HiveQL makes Druid easy to work with.  CloudBreak makes the deployment super quick.               
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		10-09-2018
	
		
		06:10 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							     This article will demonstrate how to rapidly launch a Spark cluster on AWS via CloudBreak.  The prerequisites are documented here. Once you have a AWS account and credentials, launching a Spark cluster is simple.  CloudBreak is your command and control center UI for rapidly launching clusters on AWS, Az\ure, GCP, and on prem. Once the UI is up, add your AWS credentials      
 Select AWS as your cloud provider       
 Select the method for authentication.
 
 Key or Role. I prefer role but both work well. Click on the help button and follow the directions on how to setup auth for either method.         
 Now that credentials have been setup, cluster creation may begin. Click on "Clusters" on top left and then click on "Create Cluster" on top right       
 Select Advanced on top left  Select Credential: Your AWS Credentials  Cluster Name: Name your cluster  Region: AWS Region  Platform Version: HDP 3.0  Cluster Type: To run data science and ETL workloads, select HDP 3.0 Data Science blueprint  Click Next       
 Choose Image Type: Select Base Image  Choose Image: Select Redhat from drop down list       
 Here options are presented to select AWS instance types. If doing this for the first time, the defaults are fine. Click Next       
 Select the VPC this cluster will be deployed to. If a VPC has not been pre-created, CloudBreak will create one. Click Next       
 Clusters launched on AWS can access data stored in s3. Instructions on enabling s3 access is here.       
 Recipes are actions performed on nodes before and/or after cluster install. If custom actions are not required, click next       
 Next option is to configure auth and metadata database. For those just beginning, click next.   
 Knox is highly recommended; however, if running for first time then disable it.       
 Select AWS security group (SG). If SG has not been pre-created CloudBreak will create one.       
 Lastly, enter a password for the admin user and ssh key. SSH key will be required if there is interest in ssh'ing into the nodes.       The cluster may take 5-15 minutes to deploy. Once the cluster is up the Ambari URL will be available. Enjoy! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
         
					
				













