About Carolyn

Carolyn · ‎09-08-2016

Thanks! Glad it was helpful!

Carolyn · ‎09-07-2016

Also, could you post any additional context in the stack trace. Are there additional exceptions?

Carolyn · ‎09-07-2016

Could you post your spark code?

Carolyn · ‎09-02-2016

Your zip file may be corrupted. How were the files imported into HDFS and hive? Check out this article: https://community.hortonworks.com/questions/52722/when-try-to-run-simple-hive-query-select-count-for.html Here is a way to load compressed zip files into hive: https://cwiki.apache.org/confluence/display/Hive/CompressedStorage

Carolyn · ‎09-01-2016

The hive testbench consists of a data generator and a standard set of queries typically used for benchmarking hive performance. This article describes how to generate data and run a query in using beeline and Hive 2.0 with and without LLAP. It also shows how to use explain to see the difference in query plans. If you don't have a cluster already configured for LLAP, you can provision one in AWS using Hortonworks Cloud. See this article for instructions on how to provision a 2.5 tech preview with LLAP enabled. 1. Log into the master node in you cluster where Hive is installed. If you used Hortonworks Cloud to create your instance, locate the node with a name ending in master. The ssh command is shown next to the master instance. If you are logging in from a linux host, click on the icon to the right of the ssh command to select the command text and copy the command. In the linux shell, change to the directory containing your AWS key .pem file and the run the copied command. If you are logging in from Windows, consult the AWS user guide for instructions on how to log in using putty with the user name cloudbreak and authenticating with the key file. 2. Sudo to the hdfs user to begin generating data. Change to the home directory for the hdfs user: sudo -u hdfs -s cd /home/hdfs 3. Download the testbench utilities from Github and unzip them: wget https://github.com/hortonworks/hive-testbench/archive/hive14.zip unzip hive14.zip 4. Open the load_partitioned.sql file in an editor: vi hive-testbench-hive14/settings/load-partitioned.sql 5. Correct the hive.tez.java.opts setting: Comment out the line below by adding -- at the beginning of the line: -- set hive.tez.java.opts=-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/; Add the line below: set hive.tez.java.opts=-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/; Save the file and exit. 6. Generate 30G of test data: /* In case GCC is not installed */ yum install gcc /* If javac is not found */ export JAVA_HOME=/usr/jdk64/jdk1.8.0_77 export PATH=$JAVA_HOME/bin:$PATH cd hive-testbench-hive14/ sudo ./tpcds-build.sh ./tpcds-setup.sh 30 7. A map reduce job runs to create the data and load the data into hive. This will take some time to complete. The last line in the script is: Data loaded into database tpcds_bin_partitioned_orc_30. 8. Choose a query to run for benchmarking. For example query55.sql. Copy the query of of your choice and make an explain version of the query. The explain query will be helpful later on to see how hive is planning the query. cd sample-queries-tpcds cp query55.sql explainquery55.sql vi explainquery55.sql Add the keyword explain before the query. For example the first line of the explain of query 55: explain select i_brand_id brand_id, i_brand brand, Save and quit out of the file. 9. You are now ready to issue a benchmark query. Start the beeline hive2 cli. beeline -i testbench.settings -u jdbc:hive2://localhost:10500/tpcds_bin_partitioned_orc_30 10. To try a query without LLAP, set hive.llap.execution.mode=none and run a query. For example, the command line below will run benchmark query 55: set hive.llap.execution.mode=none; !run query55.sql Note the completion time at the end of the query is 18.984 without LLAP: 11. Now try the query with LLAP, set hive.llap.execution.mode=all and run the query again: set hive.llap.execution.mode=all; !run query55.sql 12. Notice that the query with LLAP completes much more quickly. If you don’t see a significant speed up at first, try the same query again. As the LLAP cache fills with data, queries respond more quickly. Below are the results of the next two runs of the same query with LLAP set to all. The second query returned in 8.455 seconds and a subsequent query in 2.745 seconds. If your cluster has been up and you have been doing LLAP queries on this data your performance my be in the 2 second range on the first try: 13. To see the difference between the query plans, use the explain query to show the plan for a query with no LLAP. Take note of the vectorized outlined in red in the screen shot below: set hive.llap.execution.mode=none; !run explainquery55.sql 14. Try the explain again, with LLAP enabled: set hive.llap.execution.mode=all; !run explainquery55.sql 15. Notice in the explain plan for the LLAP query, LLAP is shown after the vectorized keyword. References: Beeline https://github.com/hortonworks/hive-testbench https://community.hortonworks.com/questions/51333/hive-testbench-error-when-generating-data.html https://community.hortonworks.com/questions/23988/not-able-to-run-hive-benchmark-test.html

Carolyn · ‎09-01-2016

Are you eager to see how the new LLAP Hive feature in Hortonworks Data Platform 2.5 Tech Preview optimizes queries? Using Hortonworks Cloud, you can easily stand up a cluster in Amazon Web Services configured with LLAP, load up some benchmark data and compare queries using LLAP and without LLAP. 1. A Key Pair is required when logging into an AWS host. If you already have a key pair and its associated PEM file, skip to the next step. If this is the first time you are creating AWS instances in a region or you want to create a new Key Pair, follow the steps in the AWS KeyPair documentation. When you download the key file, make sure you save the file to a known location. You will not be able to log into your instance without it and AWS will not give you another opportunity to download it. If you lose the key file, you can terminate your existing instance and launch new ones. 2. Launch a CloudController instance. The CloudController provides a web interface where you can quickly spin up a cluster with Hive configured with LLAP. Click here to get the latest AWS Cloud Formation with the Hortonworks template. Click on the green Launch the CloudFormation Template button. 3. Click the Next button. 4. Complete the required fields on the Specify Details form: a. Enter your email address and a password. Note the email and password. You will need this password to log into the cloud controller before you can start launching clusters. b. Select the name of the SSH Key created in the first step. c. Enter a CIDR IP that specifies the range of networks IPs allowed to access the instances in the cluster. Entering 0.0.0.0/0 will allow any IP to log into this host with the key or access the web urls. To be more secure you can limit access by entering a CIDR that restricts the range of ips that can access the host. Click here to navigate to a browser and show your ip address. Change the last number in the dotted quad to a 0 and add a /24 at the end. For example, if the browser shows 1.2.3.4, use the CIDR 1.2.3.0/24. Using this value will restrict the ips allowed to connect to the instances to IP range 1.2.3.0 to 1.2.3.255. 5. Click the Next button to move on to the Options page. You can accept the defaults for this page. 6. Click Next to move on to the Review page. Scroll to the bottom of the form and check the Acknowledgement box. 7. Click Create. 8. AWS begins to create the Cloud Formation instance. Select the Services > Cloud Formation button on the top left of the browser. AWS is creating the HortonworksCloudController. It takes a few minutes to complete. 9. Click on the HortonworksCloudController link to watch the progress of the instance. 10. When the instance status is CREATE_COMPLETE, expand the Outputs section. The Outputs section shows the URL to access the Cloud Controller. The outputs section also contains the command to use to SSH into the Cloud Controller instance. The SSH instructions are useful for troubleshooting. NOTE: If you shut down the cloud controller instance and start it up again, its DNS name will change and the URL displayed in the output section of Cloud Formation will no longer work. If this happens, go to the EC2 Dashboard and click on Instances. Click the instance called HortonworksCloudController-cbd. On the Description tab find the Public DNS field. Use the URL https://<HortonworksCloudController-cbd public DNS> 11. Click on the CloudUrl. AWS uses a self signed certificate for its SSL connection so you will have to accept a certificate exception in your browser. Exceptions can be easily added in both FireFox and Safari. 12. The Hortonworks Cloud login screen appears. Enter the email and password specified in step 4a. Click the LOG IN button. 13. Check I agree to the Terms of Use check box. Click the I AGREE button to accept the Terms of Use. 14. Click CREATE CLUSTER to begin creating an LLAP enabled cluster. 15. The CREATE CLUSTER screen opens and you can begin to provision a new cluster. a. Enter a cluster name. All the hosts in the new cluster will begin with this name. b. Select HDP Version HDP 2.5 c. Select Cluster Type EDW-Analytics: Apache Hive 2 LLAP, Apache Zeppelin d. If you want to shut down your cluster instances and restart them again to save costs, use the HARDWARE & STORAGE SHOW ADVANCED OPTIONS drop down to select SSD disks. Go to the Storage Per Instance section and select Storage Type General Purpose (SSD). Increase the Count to 2. e. In the NETWORK & SECURITY section select the SSH key used to log into the instances. See Step 1. f. Enter the CIDR specifying the range of network IPs that can log into the instance. Use the same value as Step 4c or accept the default 0.0.0.0/0 to allow login from any IP address. g. Enter the password for the Ambari admin user and enter the password again to confirm it. Take note of this password as you will need it to log into the Ambari management console for the cluster. h. Click CREATE CLUSTER to launch provisioning for a four node cluster configured to use Hive 2.0 with LLAP. i. Click YES, CREATE CLUSTER from the CONFIRM CLUSTER_CREATE screen. j. Hortonworks Cloud begins creating the cluster. 16. Click on the cluster to see the status of the cluster creation. It will take a few minutes for Hortonworks Cloud to create the instances and build the cluster. 17. When the cluster is complete, you will see Ambari cluster built at the top of the Event History. 18. Select Ambari Web from the Ambari drop down. You will need to select a certificate exception in your browse. 19. The Ambari login screen will appear. Enter admin for the user and the password entered in Step 15g. Click the Sign In button. 20. View the Ambari dashboard and verify that the cluster is operational with 0 alerts. 21. Select Hive from the left side of the Ambari dashboard. Click on the Config tab. View the Interactive Query section of the configuration. Verify that Enable Interactive Query (Tech Preview) is set to Yes. If you scroll down the Interactive Query configuration section, you can see the LLAP settings. 22. Load up your data and start testing your queries. For LLAP data must be in ORC format and the execution engine must be Tez. If you don't have data ready for use or it is not easy to load your data into the cloud, look at this article on how to use the hive test bench. It is an easy way to generate test Hive tables in the correct format and execute standard hive benchmarking queries. To issue a query using LLAP, start beeline using the hive2 interface (port 10500): beeline -i testbench.settings -u jdbc:hive2://localhost:10500/tpcds_bin_partitioned_orc_30 23. To try a query without LLAP, set hive.llap.execution.mode=none and run a query. For example, the command line below will run benchmark query 55: set hive.llap.execution.mode=none; !run query55.sql 24. Now try the query with LLAP, set hive.llap.execution.mode=all and run the query again: set hive.llap.execution.mode=all; !run query55.sql Try running the LLAP query multiple times and you should see incremental improvement as the cache populates. LLAP References: http://hortonworks.com/blog/announcing-apache-hive-2-1-25x-faster-queries-much/ http://hortonworks.com/blog/llap-enables-sub-second-sql-hadoop/ http://www.slideshare.net/HadoopSummit/llap-subsecond-analytical-queries-in-hive https://www.youtube.com/watch?v=msRFRckv73Y https://www.youtube.com/watch?v=3xIo6lyGYeM Hortonworks Cloud and Cloudbreak References: http://hortonworks.com/blog/quickly-launch-hortonworks-data-platform-amazon-web-services/ http://hortonworks.github.io/hdp-aws/ Hortonworks Cloud Controller template: http://hortonworks.github.io/hdp-aws/launch/

Carolyn · ‎08-19-2016

Carolyn · ‎08-19-2016

Thanks for the tip. I think the explain is a bit easier to find.

Carolyn · ‎08-19-2016

@Timothy Spann I was able to work around the problem by using the !run beeline command: !run query55.sql Another alternative is to use the -f option to beeline and avoid the interactive shell altogether: beeline -i my.settings -u jdbc:hive2://host:10500/my_table -f query.sql

Carolyn · ‎08-16-2016

@Timothy Spann How do I change the property names in Ambari? I see how to add a new custom property but I need to remove the one that is incorrect.

Online	Offline
Last Visited	‎11-13-2024 05:05 PM

Member Since	‎08-02-2019 06:47 AM
Last Visited	‎11-13-2024 05:05 PM
Posts	131
Kudos received	93

Cloudera Community

Re: what need to consider when adding new 17 kafka...

Re: Nifi web page does not start Windows 2008 Serv...

Re: HDP 2.6 on HD Cloud - HiveServer2 interactive ...

Re: Ingesting XML Telemetry in Metron

Re: NiFi ExecuteFlumeSource error - "unable to loa...

Re: How to Use Hortonworks Cloud to provision a cl...

Re: Spark job failed when new HiveContext object

Re: Spark job failed when new HiveContext object

Re: urgent need: When try to run simple hive query...

How to use Hive testbench to perform benchmarks an...

How to Use Hortonworks Cloud to provision a cluste...

Re: how to show a query is using LLAP

Re: how to show a query is using LLAP

Re: hdp 2.5 tech preview: source the same query fi...

Re: hdp 2.5 tech preview: source the same query fi...