Member since
04-10-2019
16
Posts
30
Kudos Received
0
Solutions
10-23-2018
06:49 PM
2 Kudos
If MySQL Driver does not exist by default, or not installed when MySQL is installed, then the provisioned cluster has the following errors. In this case, create a Recipe in Cloudbreak to install MySQL jdbc driver at pre-ambari-start time: During Create Cluster, use Advance option. Pick the recipe pk-install-mysql-driver-pre from the drop down and click Attach. This will ensure that the jdbc driver is installed before Ambari is started. This should then result in a clean install.
... View more
Labels:
10-23-2018
06:29 PM
2 Kudos
If you see errors related to YARN Registry DNS Bind Port. Where the YARN Registry DNS is stopped, most probably due to a port conflict. Then go into the Advanced Configurations. Look for the parameter RegistryDNS Bind Port and change is to a port that does not have a conflict. I changed it from 53 to 553. Save configuration changes. Restart all components that are impacted. This should take care of the issue and YARN Registry DNS will start successfully.
... View more
10-16-2018
01:31 AM
2 Kudos
While it is possible to span a cluster across multiple Availability Zones, we don’t generally recommend spanning availability zones for the following reasons:
1. In AWS, there is a natural latency involved with data moving across multiple AZs which will lead to performance problems or other issues in the cluster. Specially as the cluster size and workload increases the performance issues are more pronounced. 2. Double billing: As a natural part of cluster functioning, there will be data transfer. According to AWS FAQs - Each instance is charged for its data in and data out at corresponding Data Transfer rates. Therefore, if data is transferred between these two instances, it is charged at "Data Transfer Out from EC2 to Another AWS Region" for the first instance and at "Data Transfer In from Another AWS Region" for the second instance. Please refer to this page for detailed data transfer: https://aws.amazon.com/ec2/faqs/ However, mission requirements and/or regulations may require spanning multiple availability zones which is possible and supported. I would also like to add that we support multiple AZs, but not multiple Regions within a single cluster.
... View more
09-19-2018
06:33 PM
I would recommend using Cloudbreak to provision a cluster on AWS. It is simple, intuitive and fast. Cloudbreak uses Ambari blueprints, you can customize them or use the ones that come out of the box. Detail steps to create a HDP/HDF Cluster https://github.com/purn1mak/HadoopSummitCloudbreak/blob/master/README.md Create a basic HDF cluster. https://youtu.be/enLrboB0aKo Let us know if you run into any issues.
... View more
08-16-2018
09:18 PM
1 Kudo
Launch Cloudbreak on AWS Before launching Cloudbreak on AWS, review and meet the prerequisites. Meet the Prerequisites Before launching Cloudbreak on AWS, you must meet the following prerequisites.
AWS Account AWS Region SSH Key Pair Key Based Authentication AWS Account In order to launch Cloudbreak on AWS, you must log in to your AWS account. If you don't have an account, you can create one at https://aws.amazon.com/. AWS Region Decide in which AWS region you would like to launch Cloudbreak. The following AWS regions are supported: SSH Key Pair Import an existing Key Pair or generate a new Key Pair in the AWS region which you are planning to use for launching Cloudbreak and clusters. You need this SSH Key Pair to SSH to the Cloudbreak instance and Start Cloudbreak. To do this use the following steps.
Navigate to the Amazon EC2 console at https://console.aws.amazon.com/ec2/. Check the region listed in the top right corner to make sure that you are in the correct region. In the left pane, find NETWORK AND SECURITY and click Key Pairs. Do one of the following:
To generate a new Key Pair:
Click Create Key Pair to create a new key pair Your private Key file will be automatically downloaded onto your computer. Make sure to save it in a secure location. You will need it to SSH to the cluster nodes. You may want to change access settings for the file using chmod 400 my-key-pair.pem. To import an existing Public Key:
Click Import Key Pair to upload an existing Public Key Select the Public Key and click Import. Make sure that you have access to its corresponding Private Key. Key Based Authentication If you are using key-based authentication for Cloudbreak on AWS, you must be able to provide your AWS Access Key and Secret Key pair. Cloudbreak will use these keys to launch the resources. You must provide the Access and Secret Keys later in the Cloudbreak web UI later when creating a credential. If you choose this option, all you need to do at this point is check your AWS account and make sure that you can access this Key Pair. You can generate new access and secret keys from the IAM Console. To do this go to the IAM Service:
In the left pane click on Users > Select a user Click on the Security credentials tab Creat Access Key or use an existing Access Key. There is a limit of two. To create a new user go to IAM Service: 1.In the left pane click on Users > Select Add User 2.Enter a user name, select Programming access and then click on Next: Permissions 3.For Set Permissions keep the defaul "Add usert to group" 4.In Add User to group select all the groups then click Next:Review 5.Review your choices and click on Create User 6.If your user has been created successfully you will see a similar image as below 7.Once you are done with these steps, you are now ready to launch Cloudbreak. Cloudbreak from quickstart template Follow documentation provided here to install cloudbreak. Next go to:Cloudbreak on AWS. Go straight to "Log into the Cloudbreak application"
... View more
Labels:
07-23-2018
03:15 PM
3 Kudos
Nifi Flow for
writing to S3, WASB and Google Storage. Run the flow, watch as the twitter messages are
captured and then aggregated before putting them in storage. Azure Storage: Now you can go to your Azure Portal and look in the
container and you should see aggregated messages organized by year/month/day. Google Storage: Open Google Cloud Platform and go to your Storage service. Google Storage
will now contain. the aggregated messages organized by year/month/day. AWS Storage: S3 bucket in your AWS account will now have aggregated
Twitter messages organized by year/month/day. Now let’s see what’s happening here. I will only focus on the three main important processors as the others make up the simple flow. The entire flow template is available as an xml file and you can download: nificloudstorage.xml PutAzureStorage Processor: Azure: Create a Storage Account Get the Storage Account name and Key as shown in this screenshot. This is needed in the PutAzureObject Processor. PutS3Object Processor From AWS dashboard, go to Users, pick your user, click on Security Credentials. If you have not saved the Secret Access key then use Create Access key button to generate it again. There is a limit of only 2 keys. PutGCSObject Processor Setting up GCS credentials is slightly different. A Controller Service is made use of. Click on the arrow in GCPCredentialsControllerService. That will take you to the next screenshot. Controller Services Click on the gears icon to take you to the properties. Use the JSON file created from your GCS credentials. You can follow this article Creating GCS credentials to find out to get this JSON Click on the lightning icon to enable this Controller Service
... View more
Labels:
07-21-2018
11:29 PM
This could be the potential solution for this "hive script does not exist" question as well: https://community.hortonworks.com/questions/25588/hive-script-error-script-does-not-exists-error.html
... View more
07-21-2018
11:27 PM
1 Kudo
Environment: HDP on AWS EC2 instances. Problem: When trying to execute a hive ddl script, we get the error file does not exist. Upon checking the file exists and also has the read write privileges. Root Cause: The directory where the script exists does not have read write permission to the user. Solution: Open the directory permission to include the necessary read, write permission to the user. Following is a screenshot that shows the problem and solution used to fix it.
... View more
Labels:
07-20-2018
07:20 PM
5 Kudos
Use S3 as storage
for Zeppelin Notebooks. Step 1. Use external storage to point to S3 bucket in Cloudbreak
advance options. This will use S3access profile and AWS credentials. CB takes
care of that set up. In addition to that Step 2 Change these 3 inzeppelin-env.sh export
ZEPPELIN_NOTEBOOK_S3_BUCKET=yourBucketName export
ZEPPELIN_NOTEBOOK_S3_ENDPOINT="http://s3.amazonaws.com/yourBucketName" export ZEPPELIN_NOTEBOOK_S3_USER=admin Step 3 Change this 1 inzeppelin-site.xml Point zeppelin.notebook.storage
to org.apache.zeppelin.notebook.repo.S3NotebookRepo Detailed Steps below 1.Complete
AWS pre-requisites 2.Create
AWS credentials in Cloudbreak 3.Launch HDP cluster on AWS using Cloudbreak. 4.Enable Zeppelin Storage on S3. 3. Launch HDP
Cluster on AWS using Cloudbreak. Not all the screenshots are included, only capturing screenshots that
focus on some key Advance features that enable the required Zeppelin Storage. a.Use the Advance
tab on Cloudbreak. b.Cloudbreak
uses AWS credentials that will provide the necessary AWS Key and Secrect Access
Key for S3 Storage setup. a.Provide an
instance profile created in AWS that has access to your S3 b.Provide
your bucket name for base storage 4.Enable Zeppelin Storage on S3. Once Ambari starts and all services are started we need to make
some configuration changes to enable Zeppelin. In the zeppelin-config change the following properties: zeppelin_notebook.s3.bucket zeppelin_notebook.s3.user zeppelin_notebook.storage Or you could change them in zeppelin-env:
export ZEPPELIN_NOTEBOOK_S3_BUCKET=bucketName export ZEPPELIN_NOTEBOOK_S3_ENDPOINT="http://s3.amazonaws.com/bucketName" export ZEPPELIN_NOTEBOOK_S3_USER=admin
Here is an example path: bucket/user/notebook/2A94M5J1Z/note.json Now when you create and save Notebooks in Zeppelin, it will save
in S3. You will be able to see the notebooks in your AWS portal, in your
S3 bucket. Zeppelin notebooks use 9 character hash as the name of the folder
and note.json file in that folder.
... View more
07-06-2018
10:34 PM
2 Kudos
Launching Cloudbreak on GCP Before launching Cloudbreak on GCP, you must meet the following prerequisites. Meet the Prerequisites Before launching Cloudbreak on GCP, you must meet the following prerequisites.
GCP Account Service Account SSH Key Pair Region and zone GCP account In order to launch Cloudbreak on GCP, you must log in to your GCP account. If you don't have an account, you can create one at https://console.cloud.google.com. Once you log in to your GCP account, you must either create a project or use an existing project. To create a new project, provide name, choose organization or leave it as No Organization. Now Select this newly created Project. On the main dashboard page, you will find the Project ID. You will need this to define your credential in Cloudbbreak in a later step GCP - APIs Dashboard Go to the Service Accounts screen by (1) clicking the menu in the top left, (2) hovering over APIs and Services, and (3) Clicking on Dashboard. GCP - APIs & Services Dashboard Verify that the Google Compute Engine API is listed and enabled. If it is not click on the Enable APIs button to search for and enable it. Service account Go to the Service Accounts screen by (1) clicking the menu in the top left, (2) hovering over IAM & Admin, and (3) Clicking on Service Accounts. Create Service Account - Step 1 Click "Create Service Account" Create Service Account - Step 2 Give the service account a name Check the "Furnish a new key" box. This will download a key to your computer when you finish creating the account. If you are using Cloudbreak 2.7 or later, select JSON format key. Click the "Select a Role" dropdown Select the required Compute Engine roles (Compute Image User, Compute Instance Admin(v1) Compute Network Admin, Compute Security Admin, Compute Storage Admin). Select the Storage Admin role under Storage. Click outside of the roles selection dropdown to reveal the "create" button. All six of the roles shown are required for the service account Access to Google Storage If you also want to be able to use the GCP storage you need to add one more ROLE associated with the service account. The role is "Service account User" and you can find it under Service Accounts. You should now have the following Roles. SSH key pair Generate a new SSH key pair or use an existing SSH key pair. You will be required to provide it when launching the VM. On Linux or macOS workstations, you can generate a key with the ssh-keygen tool. Open a terminal on your workstation and use the ssh-keygen command to generate a new key. This command generates a private SSH key file and a matching public SSH key with the following structure: where: [KEY_VALUE] is the key value that you generated. [USERNAME] is the user that this key applies to. ssh-rsa [KEY_VALUE] [USERNAME] Editing public SSH key metadata Add or remove project-wide public SSH keys from the GCP Console: In the Google Cloud Platform Console, go to the metadata page for your project. GO TO THE METADATA PAGE - Click on 1. Burger Menu 2. Compute Engine 3. Metadata 4. SSH Keys tab click Edit, Add Item and then Save. Modify the project-wide public SSH keys: To add a public SSH key, click Add item at the bottom of the page. This will produce a text box. Copy the contents of your public SSH key file and paste them in to the text box. Repeat this process for each public SSH key that you want to add. To remove a public SSH key, click the removal button next to it: Region and zone Decide in which region and zone you would like to launch Cloudbreak. You can launch Cloudbreak and provision your clusters in all regions supported by GCP. Clusters created via Cloudbreak can be in the same or different region as Cloudbreak; when you launch a cluster, you select the region in which to launch it.
... View more
Labels: