Created on 09-01-2016 01:25 AM - edited 08-17-2019 10:29 AM
Are you eager to see how the new LLAP Hive feature in Hortonworks Data Platform 2.5 Tech Preview optimizes queries? Using Hortonworks Cloud, you can easily stand up a cluster in Amazon Web Services configured with LLAP, load up some benchmark data and compare queries using LLAP and without LLAP.
1. A Key Pair is required when logging into an AWS host. If you already have a key pair and its associated PEM file, skip to the next step. If this is the first time you are creating AWS instances in a region or you want to create a new Key Pair, follow the steps in the AWS KeyPair documentation. When you download the key file, make sure you save the file to a known location. You will not be able to log into your instance without it and AWS will not give you another opportunity to download it. If you lose the key file, you can terminate your existing instance and launch new ones.
2. Launch a CloudController instance. The CloudController provides a web interface where you can quickly spin up a cluster with Hive configured with LLAP. Click here to get the latest AWS Cloud Formation with the Hortonworks template. Click on the green Launch the CloudFormation Template button.
3. Click the Next button.
4. Complete the required fields on the Specify Details form:
a. Enter your email address and a password. Note the email and password. You will need this password to log into the cloud controller before you can start launching clusters.
b. Select the name of the SSH Key created in the first step.
c. Enter a CIDR IP that specifies the range of networks IPs allowed to access the instances in the cluster. Entering 0.0.0.0/0 will allow any IP to log into this host with the key or access the web urls. To be more secure you can limit access by entering a CIDR that restricts the range of ips that can access the host. Click here to navigate to a browser and show your ip address.
Change the last number in the dotted quad to a 0 and add a /24 at the end. For example, if the browser shows 188.8.131.52, use the CIDR 184.108.40.206/24. Using this value will restrict the ips allowed to connect to the instances to IP range 220.127.116.11 to 18.104.22.168.
5. Click the Next button to move on to the Options page. You can accept the defaults for this page.
6. Click Next to move on to the Review page. Scroll to the bottom of the form and check the Acknowledgement box.
7. Click Create.
8. AWS begins to create the Cloud Formation instance. Select the Services > Cloud Formation button on the top left of the browser. AWS is creating the HortonworksCloudController. It takes a few minutes to complete.
9. Click on the HortonworksCloudController link to watch the progress of the instance.
10. When the instance status is CREATE_COMPLETE, expand the Outputs section. The Outputs section shows the URL to access the Cloud Controller. The outputs section also contains the command to use to SSH into the Cloud Controller instance. The SSH instructions are useful for troubleshooting.
NOTE: If you shut down the cloud controller instance and start it up again, its DNS name will change and the URL displayed in the output section of Cloud Formation will no longer work. If this happens, go to the EC2 Dashboard and click on Instances. Click the instance called HortonworksCloudController-cbd. On the Description tab find the Public DNS field. Use the URL https://<HortonworksCloudController-cbd public DNS>
11. Click on the CloudUrl. AWS uses a self signed certificate for its SSL connection so you will have to accept a certificate exception in your browser. Exceptions can be easily added in both FireFox and Safari.
12. The Hortonworks Cloud login screen appears. Enter the email and password specified in step 4a. Click the LOG IN button.
14. Click CREATE CLUSTER to begin creating an LLAP enabled cluster.
15. The CREATE CLUSTER screen opens and you can begin to provision a new cluster.
a. Enter a cluster name. All the hosts in the new cluster will begin with this name.
b. Select HDP Version HDP 2.5
c. Select Cluster Type EDW-Analytics: Apache Hive 2 LLAP, Apache Zeppelin
d. If you want to shut down your cluster instances and restart them again to save costs, use the HARDWARE & STORAGE SHOW ADVANCED OPTIONS drop down to select SSD disks.
Go to the Storage Per Instance section and select Storage Type General Purpose (SSD). Increase the Count to 2.
e. In the NETWORK & SECURITY section select the SSH key used to log into the instances. See Step 1.
f. Enter the CIDR specifying the range of network IPs that can log into the instance. Use the same value as Step 4c or accept the default 0.0.0.0/0 to allow login from any IP address.
g. Enter the password for the Ambari admin user and enter the password again to confirm it. Take note of this password as you will need it to log into the Ambari management console for the cluster.
h. Click CREATE CLUSTER to launch provisioning for a four node cluster configured to use Hive 2.0 with LLAP.
i. Click YES, CREATE CLUSTER from the CONFIRM CLUSTER_CREATE screen.
j. Hortonworks Cloud begins creating the cluster.
16. Click on the cluster to see the status of the cluster creation. It will take a few minutes for Hortonworks Cloud to create the instances and build the cluster.
17. When the cluster is complete, you will see Ambari cluster built at the top of the Event History.
18. Select Ambari Web from the Ambari drop down. You will need to select a certificate exception in your browse.
19. The Ambari login screen will appear. Enter admin for the user and the password entered in Step 15g. Click the Sign In button.
20. View the Ambari dashboard and verify that the cluster is operational with 0 alerts.
21. Select Hive from the left side of the Ambari dashboard. Click on the Config tab. View the Interactive Query section of the configuration. Verify that Enable Interactive Query (Tech Preview) is set to Yes. If you scroll down the Interactive Query configuration section, you can see the LLAP settings.
22. Load up your data and start testing your queries. For LLAP data must be in ORC format and the execution engine must be Tez. If you don't have data ready for use or it is not easy to load your data into the cloud, look at this article on how to use the hive test bench. It is an easy way to generate test Hive tables in the correct format and execute standard hive benchmarking queries.
To issue a query using LLAP, start beeline using the hive2 interface (port 10500):
beeline -i testbench.settings -u jdbc:hive2://localhost:10500/tpcds_bin_partitioned_orc_30
23. To try a query without LLAP, set hive.llap.execution.mode=none and run a query. For example, the command line below will run benchmark query 55:
24. Now try the query with LLAP, set hive.llap.execution.mode=all and run the query again:
Try running the LLAP query multiple times and you should see incremental improvement as the cache populates.
Hortonworks Cloud and Cloudbreak References:
Hortonworks Cloud Controller template: