Member since
08-02-2019
131
Posts
93
Kudos Received
13
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3697 | 12-03-2018 09:33 PM | |
4566 | 04-11-2018 02:26 PM | |
2520 | 05-09-2017 09:35 PM | |
1136 | 03-31-2017 12:59 PM | |
2183 | 11-21-2016 08:58 PM |
09-08-2016
06:04 PM
Thanks! Glad it was helpful!
... View more
09-07-2016
04:56 PM
Also, could you post any additional context in the stack trace. Are there additional exceptions?
... View more
09-07-2016
04:52 PM
Could you post your spark code?
... View more
09-02-2016
02:27 PM
1 Kudo
Your zip file may be corrupted. How were the files imported into HDFS and hive? Check out this article: https://community.hortonworks.com/questions/52722/when-try-to-run-simple-hive-query-select-count-for.html Here is a way to load compressed zip files into hive: https://cwiki.apache.org/confluence/display/Hive/CompressedStorage
... View more
09-01-2016
01:26 AM
12 Kudos
The hive testbench consists of a data generator and a standard set of queries typically used for benchmarking hive performance. This article describes how to generate data and run a query in using beeline and Hive 2.0 with and without LLAP. It also shows how to use explain to see the difference in query plans. If you don't have a cluster already configured for LLAP, you can provision one in AWS using Hortonworks Cloud. See this article for instructions on how to provision a 2.5 tech preview with LLAP enabled. 1. Log into the master node in you cluster where Hive is installed. If you used Hortonworks Cloud to create your instance, locate the node with a name ending in master.
The ssh command is shown next to the master instance. If you are logging in from a linux host, click on the icon to the right of the ssh command to select the command text and copy the command.
In the linux shell, change to the directory containing your AWS key .pem file and the run the copied command. If you are logging in from Windows, consult the AWS user guide for instructions on how to log in using putty with the user name cloudbreak and authenticating with the key file. 2. Sudo to the hdfs user to begin generating data. Change to the home directory for the hdfs user: sudo -u hdfs -s
cd /home/hdfs 3. Download the testbench utilities from Github and unzip them: wget https://github.com/hortonworks/hive-testbench/archive/hive14.zip
unzip hive14.zip 4. Open the load_partitioned.sql file in an editor: vi hive-testbench-hive14/settings/load-partitioned.sql 5. Correct the hive.tez.java.opts setting: Comment out the line below by adding -- at the beginning of the line: -- set hive.tez.java.opts=-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/; Add the line below: set hive.tez.java.opts=-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/; Save the file and exit. 6. Generate 30G of test data: /* In case GCC is not installed */
yum install gcc
/* If javac is not found */
export JAVA_HOME=/usr/jdk64/jdk1.8.0_77
export PATH=$JAVA_HOME/bin:$PATH
cd hive-testbench-hive14/
sudo ./tpcds-build.sh
./tpcds-setup.sh 30 7. A map reduce job runs to create the data and load the data into hive. This will take some time to complete. The last line in the script is: Data loaded into database tpcds_bin_partitioned_orc_30. 8. Choose a query to run for benchmarking. For example query55.sql. Copy the query of of your choice and make an explain version of the query. The explain query will be helpful later on to see how hive is planning the query. cd sample-queries-tpcds
cp query55.sql explainquery55.sql
vi explainquery55.sql Add the keyword explain before the query. For example the first line of the explain of query 55: explain select i_brand_id brand_id, i_brand brand, Save and quit out of the file. 9. You are now ready to issue a benchmark query. Start the beeline hive2 cli. beeline -i testbench.settings -u jdbc:hive2://localhost:10500/tpcds_bin_partitioned_orc_30 10. To try a query without LLAP, set hive.llap.execution.mode=none and run a query. For example, the command line below will run benchmark query 55: set hive.llap.execution.mode=none;
!run query55.sql Note the completion time at the end of the query is 18.984 without LLAP: 11. Now try the query with LLAP, set hive.llap.execution.mode=all and run the query again: set hive.llap.execution.mode=all;
!run query55.sql 12. Notice that the query with LLAP completes much more quickly. If you don’t see a significant speed up at first, try the same query again. As the LLAP cache fills with data, queries respond more quickly. Below are the results of the next two runs of the same query with LLAP set to all. The second query returned in 8.455 seconds and a subsequent query in 2.745 seconds. If your cluster has been up and you have been doing LLAP queries on this data your performance my be in the 2 second range on the first try: 13. To see the difference between the query plans, use the explain query to show the plan for a query with no LLAP. Take note of the vectorized outlined in red in the screen shot below: set hive.llap.execution.mode=none;
!run explainquery55.sql 14. Try the explain again, with LLAP enabled: set hive.llap.execution.mode=all;
!run explainquery55.sql 15. Notice in the explain plan for the LLAP query, LLAP is shown after the vectorized keyword. References: Beeline https://github.com/hortonworks/hive-testbench https://community.hortonworks.com/questions/51333/hive-testbench-error-when-generating-data.html https://community.hortonworks.com/questions/23988/not-able-to-run-hive-benchmark-test.html
... View more
Labels:
09-01-2016
01:25 AM
17 Kudos
Are you eager to see how the new LLAP Hive feature in Hortonworks Data Platform 2.5 Tech Preview optimizes queries? Using Hortonworks Cloud, you can easily stand up a cluster in Amazon Web Services configured with LLAP, load up some benchmark data and compare queries using LLAP and without LLAP. 1. A Key Pair is required when logging into an AWS host. If you already have a key pair and its associated PEM file, skip to the next step. If this is the first time you are creating AWS instances in a region or you want to create a new Key Pair, follow the steps in the AWS KeyPair documentation. When you download the key file, make sure you save the file to a known location. You will not be able to log into your instance without it and AWS will not give you another opportunity to download it. If you lose the key file, you can terminate your existing instance and launch new ones. 2. Launch a CloudController instance. The CloudController provides a web interface where you can quickly spin up a cluster with Hive configured with LLAP. Click here to get the latest AWS Cloud Formation with the Hortonworks template. Click on the green Launch the CloudFormation Template button. 3. Click the Next button.
4. Complete the required fields on the Specify
Details form: a. Enter your email address and a password. Note the email and password. You will need this password to log into the
cloud controller before you can start launching clusters. b. Select the name of the SSH Key created in the
first step. c. Enter a CIDR IP that specifies the range of
networks IPs allowed to access the instances in the cluster. Entering 0.0.0.0/0 will allow any IP to log
into this host with the key or access the web urls. To be more secure you can limit access by
entering a CIDR that restricts the range of ips that can access the host. Click here
to navigate to a browser and show your ip address. Change the last number in the
dotted quad to a 0 and add a /24 at the end. For example, if the browser shows
1.2.3.4, use the CIDR 1.2.3.0/24. Using this
value will restrict the ips allowed to connect to the instances to IP range
1.2.3.0 to 1.2.3.255. 5. Click the Next button to move on to the Options
page. You can accept the defaults for
this page.
6. Click Next to move on to the Review page. Scroll to the bottom of the form and check
the Acknowledgement box. 7. Click Create. 8. AWS begins to create the Cloud Formation
instance. Select the Services >
Cloud Formation button on the top left of the browser. AWS is creating the HortonworksCloudController. It takes a few minutes to complete. 9. Click on the HortonworksCloudController link to
watch the progress of the instance.
10. When the instance status is CREATE_COMPLETE,
expand the Outputs section. The Outputs
section shows the URL to access the Cloud Controller. The outputs section also contains the command
to use to SSH into the Cloud Controller instance. The SSH instructions are useful for
troubleshooting. NOTE: If you shut down the cloud controller
instance and start it up again, its DNS name will change and the URL displayed
in the output section of Cloud Formation will no longer work. If this happens, go to the EC2 Dashboard and
click on Instances. Click the instance
called HortonworksCloudController-cbd. On the Description tab find the Public
DNS field. Use the URL
https://<HortonworksCloudController-cbd public DNS>
11. Click on the CloudUrl. AWS uses a self signed certificate for its
SSL connection so you will have to accept a certificate exception in your
browser. Exceptions can be easily added
in both FireFox and Safari. 12. The Hortonworks Cloud login screen appears. Enter the email and password specified in
step 4a. Click the LOG IN button.
13. Check I agree to the Terms of Use check
box. Click the I AGREE button to accept
the Terms of Use.
14. Click
CREATE CLUSTER to begin creating an LLAP enabled cluster. 15. The CREATE CLUSTER screen opens and you can
begin to provision a new cluster. a. Enter a cluster name. All the hosts in the new cluster will begin
with this name. b. Select HDP Version HDP 2.5 c. Select Cluster Type EDW-Analytics: Apache Hive 2
LLAP, Apache Zeppelin d. If you want to shut down your cluster instances
and restart them again to save costs, use the HARDWARE & STORAGE SHOW
ADVANCED OPTIONS drop down to select SSD disks. Go to the Storage Per Instance section and select Storage Type General Purpose (SSD). Increase the Count to 2. e. In the NETWORK & SECURITY section select the
SSH key used to log into the instances.
See Step 1. f. Enter the CIDR specifying the range of network
IPs that can log into the instance. Use
the same value as Step 4c or accept the default 0.0.0.0/0 to allow login from
any IP address. g. Enter the password for the Ambari admin user and
enter the password again to confirm it.
Take note of this password as you will need it to log into the Ambari
management console for the cluster. h. Click CREATE CLUSTER to launch provisioning for
a four node cluster configured to use Hive 2.0 with LLAP. i. Click YES, CREATE CLUSTER from the CONFIRM
CLUSTER_CREATE screen. j. Hortonworks Cloud begins creating the cluster.
16. Click on the cluster to see the status of the
cluster creation. It will take a few
minutes for Hortonworks Cloud to create the instances and build the cluster. 17. When the cluster is complete, you will see
Ambari cluster built at the top of the Event History. 18. Select Ambari Web from the Ambari drop
down. You will need to select a certificate
exception in your browse. 19. The Ambari login screen will appear. Enter admin for the user and the password
entered in Step 15g. Click the Sign In
button. 20. View the Ambari dashboard and verify that the
cluster is operational with 0 alerts. 21. Select Hive from the left side of the Ambari
dashboard. Click on the Config tab. View the Interactive Query section of the
configuration. Verify that Enable
Interactive Query (Tech Preview) is set to Yes.
If you scroll down the Interactive Query configuration section, you can
see the LLAP settings. 22. Load up your data and start testing your queries. For LLAP data must be in ORC format and the execution engine must be Tez. If you don't have data ready for use or it is not easy to load your data into the cloud, look at this article on how to use the hive test bench. It is an easy way to generate test Hive tables in the correct format and execute standard hive benchmarking queries. To issue a query using LLAP, start beeline using the hive2 interface (port 10500): beeline -i testbench.settings -u jdbc:hive2://localhost:10500/tpcds_bin_partitioned_orc_30 23. To try a query without LLAP, set hive.llap.execution.mode=none and run a query. For example, the command line below will run benchmark query 55: set hive.llap.execution.mode=none; !run query55.sql 24. Now try the query with LLAP, set hive.llap.execution.mode=all and run the query again: set hive.llap.execution.mode=all; !run query55.sql Try running the LLAP query multiple times and you should see incremental improvement as the cache populates.
LLAP References: http://hortonworks.com/blog/announcing-apache-hive-2-1-25x-faster-queries-much/ http://hortonworks.com/blog/llap-enables-sub-second-sql-hadoop/ http://www.slideshare.net/HadoopSummit/llap-subsecond-analytical-queries-in-hive https://www.youtube.com/watch?v=msRFRckv73Y https://www.youtube.com/watch?v=3xIo6lyGYeM Hortonworks Cloud and Cloudbreak References: http://hortonworks.com/blog/quickly-launch-hortonworks-data-platform-amazon-web-services/ http://hortonworks.github.io/hdp-aws/ Hortonworks Cloud Controller template: http://hortonworks.github.io/hdp-aws/launch/
... View more
Labels:
08-19-2016
09:31 PM
Here is an example of the explain with llap enabled. Note the llap after Reducer 4: | Stage-0
| | Fetch
Operator
| | limit:100
| | Stage-1 | | Reducer 4
vectorized, llap
| | File Output
Operator [FS_58]
| And below with LLAP set to none. There is no llap after Reducer 4: | Stage-0
| | Fetch
Operator
| | limit:100
| | Stage-1
| |
Reducer 4 vectorized
... View more
08-19-2016
09:00 PM
@Timothy Spann I was able to work around the problem by using the !run beeline command: !run query55.sql Another alternative is to use the -f option to beeline and avoid the interactive shell altogether: beeline -i my.settings -u jdbc:hive2://host:10500/my_table -f query.sql
... View more
08-16-2016
06:32 PM
@Timothy Spann How do I change the property names in Ambari? I see how to add a new custom property but I need to remove the one that is incorrect.
... View more
- « Previous
- Next »