Member since
05-30-2018
1322
Posts
715
Kudos Received
148
Solutions
12-11-2018
09:47 PM
6 Kudos
I came across an article on how to setup NiFi to write into ADLS which required cobbling together various integration pieces and launching HDI. Since then there have been many updates in NiFi enabling a much easier integration. Combined with CloudBreak's rapid deployment of a HDF clusters provides an incredible ease of user experience. ADLS is Azure's native cloud storage (Look and feel of HDFS) and the capability to read/write via NiFi is key. This article will demonstrate how use use a CloudBreak Recipe to rapidly deploy a HDF NiFI "ADLS Enabled" cluster. Assumptions A CloudBreak instance is available Azure Credentials available Moderate familiarity with Azure Using HDF 3.2+ From Azure you will need: ADLS url Application ID Application Password Directory ID NiFi requires ADLS jars, core-site.xml, and hdfs-site.xml. The recipe I built will fetch these resources for you. Simply download the recipe/script from: https://s3-us-west-2.amazonaws.com/sunileman1/scripts/setAdlsEnv.sh Open it and scroll all the way to the bottom Update the following: Your_ADLS_URL: with your adls url
Your_APP_ID: with your application ID
Your_APP_Password: with your application password
Your_Directory_ID: with your directory id Once the updates are completed, simply add the script under CloudBreak Recipes. Make sure to select "post-cluster-install" Begin provisioning a HDF cluster via CloudBreak. Once the Recipes page is shown, add the recipe to run on the NiFi nodes. Once cluster is up use the PutHDFS processor to write to ADLS. Configure PutHDFS Properties Hadoop Configuration Resources: /home/nifi/sities/core-site.xml,/home/nifi/sites/hdfs-sites.xml
Additional Classpath Resources: /home/nifi/adlsjars
Directory: / The above resources are all available on each node due to the recipe. All you have to do is call the location of the resources in the PutHDFS processor. That's it! Enjoy
... View more
Labels:
12-11-2018
09:43 PM
I came across an article on how to setup NiFi to write into ADLS which required users to cobble together various integration pieces and launching HDI. Since then there have been many updates in NiFi enabling a much easier integration. Combine with CloudBreak's rapid deployment of a HDF cluster provides incredible ease of use. ADLS is native cloud storage provided by Azure (Look and feel of HDFS) and the capabilities to read/write via NiFi is key. This article will demonstrate how use use CloudBreak to rapidly deploy a HDF NiFI "ADLS Enabled" cluster.
... View more
Labels:
10-09-2018
06:10 PM
2 Kudos
This article will demonstrate how to rapidly launch a Spark cluster on AWS via CloudBreak. The prerequisites are documented here. Once you have a AWS account and credentials, launching a Spark cluster is simple. CloudBreak is your command and control center UI for rapidly launching clusters on AWS, Az\ure, GCP, and on prem. Once the UI is up, add your AWS credentials
Select AWS as your cloud provider
Select the method for authentication.
Key or Role. I prefer role but both work well. Click on the help button and follow the directions on how to setup auth for either method.
Now that credentials have been setup, cluster creation may begin. Click on "Clusters" on top left and then click on "Create Cluster" on top right
Select Advanced on top left Select Credential: Your AWS Credentials Cluster Name: Name your cluster Region: AWS Region Platform Version: HDP 3.0 Cluster Type: To run data science and ETL workloads, select HDP 3.0 Data Science blueprint Click Next
Choose Image Type: Select Base Image Choose Image: Select Redhat from drop down list
Here options are presented to select AWS instance types. If doing this for the first time, the defaults are fine. Click Next
Select the VPC this cluster will be deployed to. If a VPC has not been pre-created, CloudBreak will create one. Click Next
Clusters launched on AWS can access data stored in s3. Instructions on enabling s3 access is here.
Recipes are actions performed on nodes before and/or after cluster install. If custom actions are not required, click next
Next option is to configure auth and metadata database. For those just beginning, click next.
Knox is highly recommended; however, if running for first time then disable it.
Select AWS security group (SG). If SG has not been pre-created CloudBreak will create one.
Lastly, enter a password for the admin user and ssh key. SSH key will be required if there is interest in ssh'ing into the nodes. The cluster may take 5-15 minutes to deploy. Once the cluster is up the Ambari URL will be available. Enjoy!
... View more
09-06-2018
01:56 PM
During launch of HDP or HDF on azure via cloudbreak, if the following provisioning error is thrown (Check cloudbreak logs): log:55 INFO c.m.a.m.r.Deployments checkExistence - [owner:xxxxx] [type:STACK] [id:2] [name:sparky] [flow:xxx] [tracking:] <-- 404 Not Found https://management.azure.com/subscriptions/xxxxxx/resourcegroups/spark. (104 ms, 92-byte body)/cbreak_cloudbreak_1 | 2018-09-05 14:25:22,882 [reactorDispatcher-24] launch:136 ERROR c.s.c.c.a.AzureResourceConnector - [owner:xxxxxx] [type:STACK] [id:2] [name:sparky] [flow:xxxxxx] [tracking:] Provisioning error: This means the instance type selected is not available within the region. Please change region where instance is available or change to instance type which is available within region.
... View more
Labels:
10-08-2018
10:07 PM
The error reported in bootstrap log file was 2018-10-08 18:06:37,982 ERROR [NiFi logging handler] org.apache.nifi.StdErr Caused by: java.lang.UnsatisfiedLinkError: /tmp/snappy-1.0.5-libsnappyjava.so: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.9' not found (required by /tmp/snappy-1.0.5-libsnappyjava.so)
... View more
03-27-2017
04:01 PM
3 Kudos
There are many ways to validate a json file against a avro schema to verify all is kosher. Sharing a practice I have been using for few years. Objective - Validate avro schema well bound to the json file First you must have a avro schema and json file. From there download the latest a avro-tools jar. At the moment 1.8.1 is the latest avro-tools version jar available. Store the avro schema and json file in the same directory. Issue a wget to fetch the avro-tools jar wget http://www.us.apache.org/dist/avro/avro-1.8.1/java/avro-tools-1.8.1.jar Here is what the directory looks like Objective Details - Validate avro schema student.avsc binds to student.json How - Issue the following java -jar ./avro-tools-1.8.1.jar fromjson --schema-file YourSchemaFile.avsc YourJsonFile.json > AnyNameForYourBinaryAvro.avro Using the student files example: java -jar ./avro-tools-1.8.1.jar fromjson --schema-file student.avsc student.json > student.avro Validation passed, a avro binary was created. Now as a last step lets break something. Another avro schema (student2.avsc) is created which does not conform to student.json. Lets verify the avro-tools jar will fails to build a avro binary As you can see from above output the avro binary failed to create due to validation errors
... View more
11-22-2018
02:39 PM
Does Master to Master or cyclic keeps on replicating the data back and forth ? If an upsert is executed from C1 and it is propogated to C2. Now as C1 is added in C2 as peer will the replication happen to C1 back and then again to C2 (Going C1 to C2 to C1 to C2 to C1 .....)
... View more
03-22-2017
01:15 PM
Hi @Sunile Manjee, you article is great. I tried following the same steps and was successful in creating a schema and a table within it. But if i try to drop the table, i see the error that "Table not found". And if i try to create the table, i see the message "Table exists". Any thoughts on it.
... View more
02-27-2018
09:34 PM
yes by default. You can change ports as well in ambari https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.0/bk_installing-nifi/content/ch02s04.html or it could be 8443
... View more