About sunile_manjee

sunile_manjee · ‎08-14-2019

MiNiFi (Java Version) is essentially NiFi with a few differences and hence why it runs so darn well on containers/Kubernetes. The use case is to have a single management console (Edge Flow Manager) to manage 0 + many MiNiFi agents which require autoscaling on Kubernetes based on some arbitrary metrics...for example CPU/RAM threshold. EFM and NiFi Registry are required but don't need autoscaling; therefore, these services will be deployed on Azure Container Service. MiNiFi on the other hand often benefits from autoscaling and hence it will be deployed on Azure Kubernetes Service. Required for this demonstration Azure subscription Container Registry Demo will leverage Azure Container Registry Kubernetes Service Demo will leverage Azure Kubernetes Service Azure CLI The following images need to be stored in Azure Container Registry Edge Flow Manager https://github.com/sunileman/efm1.0.0.0-docker NiFi Registry https://github.com/sunileman/NiFi-Registry-Service MiNiFi (Java) https://github.com/sunileman/CEM1.0-Java-MiNiFi This image will come precooked with Azure/AWS NARs Architecture This is a 10k foot view of the architecture. EFM communicates with MiNiFi agents about the work they need to do. EFM also communicates with NiFi Registry to store/version control flows will get passed to the MiNiFi agents. Deploy NiFi Registry and EFM on Azure Container Service Since EFM and Registry don't really benefit from autoscaling, they both are great fit for Azure container service (Mostly Static installs). ACS will guarantee EFM and NiFi registry are alway up with 1 container instance each. EFM, MiNiFi, and Registry have all been imported into my container registry on azure. Create NiFi Registry on ACS NiFi Registry variables to note --name Name of the nifi registry container --dns-name-label Prefix for the dns on the registry service. This will be used as an input into EFM container environment variable az container create --resource-group sunmanCentralRG --name mynifiregistry --image sunmanregistry.azurecr.io/nifiregistry:latest --dns-name-label nifiregistry --ports 18080 --registry-username ****** --registry-password ****** Create EFM on ACS EFM variables to note --NIFI_REGISTRY should match NiFi registry Container DNS (fully qualified server name) --dns-name-label DNS prefix az container create --resource-group sunmanCentralRG --name efm --image sunmanregistry.azurecr.io/efm:latest --dns-name-label myefm --ports 10080 --registry-username ***** --registry-password **** --environment-variables 'NIFI_REGISTRY_ENABLED'='true' 'NIFI_REGISTRY_BUCKETNAME'='testbucket' 'NIFI_REGISTRY'='http://mynifiregistry.centralus.azurecontainer.io:18080' Create a 'testbucket' on NiFi Registry MiNiFi flows will be designed using EFM and stored in the NiFi Registry bucket 'testbucket'. This bucket name was identified as a variable during EFM container was creation. 'NIFI_REGISTRY_BUCKETNAME'='testbucket' NiFi registry will be available under YourNiFiRegistryDSN:18080/nifi-registry/ . For example http://mynifiregistry=y.centralus.azurecontainer.io:18080/nifi-registry/ Click on "NEW BUCKET", Enter bucket name - testbucket Click create Validate EFM is up EFM UI will be available under http://YourEfmDnsPrefix.centralus.azurecontainer.io:10080/efm/ui for example http://myefm.centralus.azurecontainer.io:10080/efm/ui Run MiNiFi Kubernetes Deployment The easiest way to run a deployment in k8s is to build a manifest file. To learn more about k8s manifest files here. Look for < > in the manifest below, as these are the variables a change prior to your deployment (only a few, super simple). Variable to Note MINIFI_AGENT_CLASS This will be the agent class published to EFM. To learn more about EFM, go here Kubernet Manifest File: apiVersion: extensions/v1beta1 kind: Deployment metadata: name: minifi spec: replicas: 1 selector: matchLabels: app: minifi template: metadata: labels: app: minifi spec: containers: - name: minifi-container image: <Your Containe Registry>/minifi-azure-aws:latest ports: - containerPort: 10080 name: http - containerPort: 6065 name: listenhttp - containerPort: 22 name: ssh resources: requests: cpu: ".05" memory: "1Gi" limits: cpu: "1" env: - name: NIFI_C2_ENABLE value: "true" - name: MINIFI_AGENT_CLASS value: "test" - name: NIFI_C2_REST_URL value: "http://<Your EFM servername>.centralus.azurecontainer.io:10080/efm/api/c2-protocol/heartbeat" - name: NIFI_C2_REST_URL_ACK value: "http://<Your EFM servername>.centralus.azurecontainer.io:10080/efm/api/c2-protocol/acknowledge" --- kind: Service #+ apiVersion: v1 #+ metadata: #+ name: minifi-service #+ spec: #+ selector: #+ app: minifi #+ ports: #+ - protocol: TCP #+ targetPort: 10080 #+ port: 10080 #+ name: http #+ - protocol: TCP #+ targetPort: 22 #+ port: 22 #+ name: ssh #+ - protocol: TCP #+ targetPort: 6065 #+ port: 6065 #+ name: listenhttp #+ type: LoadBalancer #+ Once the manifest file has been updated, store it as minifi.yml (this can be any name). Deploy on k8s using kubectl apply -f minifi.yml output sunile.manjee@hwx:~/Documents/GitHub/AKS-YAMLS(master⚡) » kubectl apply -f minifi.yml deployment.extensions/minifi created service/minifi-service created sunile.manjee@hwx:~/Documents/GitHub/AKS-YAMLS(master⚡) » MiNiFi has been successfully deployed. To verify successful deployment visit EFM. EFM should show the agent class name 'test' matching the class name used in the minifi k8s manifest file. Open the class and design any flow. Here I simply used GenerateFlowFile and terminated success relationship with 3 concurrent threads Click on publish and soon thereafter MiNiFi will be executing the flow. AutoScale MiNiFi At this time a single MiNiFi container/agent is executing flows. I purposefully set MiNiFi CPU allocation (manifest file) to a small number to force the autoscaling. First lets check the number of minifi pods running on k8s Single MiNiFi pod. Lets check if autoscaling is enabled for this deployment To enable autoscaling on k8s: kubectl autoscale deployment minifi --cpu-percent=25 --min=1 --max=3 minifi is the deployment name. If CPU utilization exceeds 25%, the autoscaler increases the pods up to a maximum of 3 instances. A minimum of 1 instances is then defined for the deployment Verify autoscaling is enabled on the minifi deployment Number of minifi pods after autoscaling was enabled (3). Kubernetes added 2 additional MiNiFi pods. Lets kill one of the pods and see what happens Kubernetes immediately launched a new MiNiFi container after a MiNiFi pod was killed. Enjoy AutoScale on MiNiFi!

sunile_manjee · ‎06-21-2019

Building an Apache NiFi processor is super easy. I have seen/read several articles on how to get started by executing maven commands via CLI. This article is geared towards individuals who like to use an IDE (specially IntelliJ) to do the imports instead of running via CLI. On IntelliJ click on create project, Check "Create from archetype", click on "ADD Archetype" and enter the following GroupId: org.apache.nifi ArtifactId: nifi-processor-bundle-archetype Version: <YourVersionOfNifi> and then click "OK" Now your new NiFi Archetype has been created. Select it Enter GroupId, ArtifactId, and Version of your choice A final attribute we need to add is artifactBaseName. This is mandatory. Click on "+" and enter Name: artifactBaseName Value: whatEverYouLike Now a project ready to build a custom processor. Enjoy!

sunile_manjee · ‎12-11-2018

I came across an article on how to setup NiFi to write into ADLS which required cobbling together various integration pieces and launching HDI. Since then there have been many updates in NiFi enabling a much easier integration. Combined with CloudBreak's rapid deployment of a HDF clusters provides an incredible ease of user experience. ADLS is Azure's native cloud storage (Look and feel of HDFS) and the capability to read/write via NiFi is key. This article will demonstrate how use use a CloudBreak Recipe to rapidly deploy a HDF NiFI "ADLS Enabled" cluster. Assumptions A CloudBreak instance is available Azure Credentials available Moderate familiarity with Azure Using HDF 3.2+ From Azure you will need: ADLS url Application ID Application Password Directory ID NiFi requires ADLS jars, core-site.xml, and hdfs-site.xml. The recipe I built will fetch these resources for you. Simply download the recipe/script from: https://s3-us-west-2.amazonaws.com/sunileman1/scripts/setAdlsEnv.sh Open it and scroll all the way to the bottom Update the following: Your_ADLS_URL: with your adls url Your_APP_ID: with your application ID Your_APP_Password: with your application password Your_Directory_ID: with your directory id Once the updates are completed, simply add the script under CloudBreak Recipes. Make sure to select "post-cluster-install" Begin provisioning a HDF cluster via CloudBreak. Once the Recipes page is shown, add the recipe to run on the NiFi nodes. Once cluster is up use the PutHDFS processor to write to ADLS. Configure PutHDFS Properties Hadoop Configuration Resources: /home/nifi/sities/core-site.xml,/home/nifi/sites/hdfs-sites.xml Additional Classpath Resources: /home/nifi/adlsjars Directory: / The above resources are all available on each node due to the recipe. All you have to do is call the location of the resources in the PutHDFS processor. That's it! Enjoy

sunile_manjee · ‎12-11-2018

I came across an article on how to setup NiFi to write into ADLS which required users to cobble together various integration pieces and launching HDI. Since then there have been many updates in NiFi enabling a much easier integration. Combine with CloudBreak's rapid deployment of a HDF cluster provides incredible ease of use. ADLS is native cloud storage provided by Azure (Look and feel of HDFS) and the capabilities to read/write via NiFi is key. This article will demonstrate how use use CloudBreak to rapidly deploy a HDF NiFI "ADLS Enabled" cluster.

sunile_manjee · ‎10-09-2018

This article will demonstrate how to rapidly launch a Spark cluster on AWS via CloudBreak. The prerequisites are documented here. Once you have a AWS account and credentials, launching a Spark cluster is simple. CloudBreak is your command and control center UI for rapidly launching clusters on AWS, Az\ure, GCP, and on prem. Once the UI is up, add your AWS credentials Select AWS as your cloud provider Select the method for authentication. Key or Role. I prefer role but both work well. Click on the help button and follow the directions on how to setup auth for either method. Now that credentials have been setup, cluster creation may begin. Click on "Clusters" on top left and then click on "Create Cluster" on top right Select Advanced on top left Select Credential: Your AWS Credentials Cluster Name: Name your cluster Region: AWS Region Platform Version: HDP 3.0 Cluster Type: To run data science and ETL workloads, select HDP 3.0 Data Science blueprint Click Next Choose Image Type: Select Base Image Choose Image: Select Redhat from drop down list Here options are presented to select AWS instance types. If doing this for the first time, the defaults are fine. Click Next Select the VPC this cluster will be deployed to. If a VPC has not been pre-created, CloudBreak will create one. Click Next Clusters launched on AWS can access data stored in s3. Instructions on enabling s3 access is here. Recipes are actions performed on nodes before and/or after cluster install. If custom actions are not required, click next Next option is to configure auth and metadata database. For those just beginning, click next. Knox is highly recommended; however, if running for first time then disable it. Select AWS security group (SG). If SG has not been pre-created CloudBreak will create one. Lastly, enter a password for the admin user and ssh key. SSH key will be required if there is interest in ssh'ing into the nodes. The cluster may take 5-15 minutes to deploy. Once the cluster is up the Ambari URL will be available. Enjoy!

sandyy006 · ‎10-06-2018

@sunile.manjee What is the zeppelin version you are using? May be hitting: https://issues.apache.org/jira/browse/ZEPPELIN-1930

sunile_manjee · ‎09-06-2018

During launch of HDP or HDF on azure via cloudbreak, if the following provisioning error is thrown (Check cloudbreak logs): log:55 INFO c.m.a.m.r.Deployments checkExistence - [owner:xxxxx] [type:STACK] [id:2] [name:sparky] [flow:xxx] [tracking:] <-- 404 Not Found https://management.azure.com/subscriptions/xxxxxx/resourcegroups/spark. (104 ms, 92-byte body)/cbreak_cloudbreak_1 | 2018-09-05 14:25:22,882 [reactorDispatcher-24] launch:136 ERROR c.s.c.c.a.AzureResourceConnector - [owner:xxxxxx] [type:STACK] [id:2] [name:sparky] [flow:xxxxxx] [tracking:] Provisioning error: This means the instance type selected is not available within the region. Please change region where instance is available or change to instance type which is available within region.

chourasiasakshi · ‎08-26-2018

I am facing the similar issue is it possible for you to post the complete code. Like, to which function you have passed IsResult?

Dominika · ‎08-17-2018

@sunile.manjee I updated the tutorial to include @pdarvasi's suggestion as a note.

MattWho · ‎08-15-2018

@sunile.manjee You must "Stop" NiFI CA before the "Delete" option is available. Once it has been deleted, I would confirm contents of your keystore and truststore are still correct in case Ambari executed the tls-toolkit and overwrote them.

Online	Offline
Last Visited	‎05-25-2022 10:07 AM

Member Since	‎05-30-2018 10:40 PM
Last Visited	‎05-25-2022 10:07 AM
Posts	1,322
Kudos received	713

Cloudera Community

Re: Iterate over ADLS files using spark?

Re: Install NiFi CA service post nifi cluster inst...

Re: Which storage format is optimum for training m...

Re: Ambari custom alert failing

Re: df.cache() is not working on jdbc table

AutoScaling MiNiFi On Kubernetes

Building a Custom Processor Using IntelliJ

NiFi ADLS Enabled Cluster Via CloudBreak

NiFi ADLS Enabled Cluster Setup Via CloudBreak

Rapid launch of a HDP 3 Spark cluster on AWS via C...

Re: Zeppelin displayhook error

CloudBreak Azure provisioning error

Re: Iterate over ADLS files using spark?

Re: Cloudbreak HDP instances require owner access ...

Re: Is NiFi CA service required for signed Certs?