Community Articles

Find and share helpful community-sourced technical articles.
avatar
Super Guru

Introduction

The term “Load Testing” has evolved over the years, however the core meaning still comes down to making sure that your system can handle a predefined amount of users at the same time. The load test is usually run over an extended period of time in a staggered manner, slowly ramping up the amount of users until you hit a predefined maximum that’s usually based on projected usage levels extrapolated from access logs with a buffer for surges. The goal of load testing is to determine the number of users that the system can typically cope with, this is called the systems “concurrency level” and give you a hard number to work with when dealing with capacity planning, performance and optimization, stability, SLAs etc.

JMeter was originally designed for testing Web Applications but has since expanded to other test functions, including database via JDBC. As such, it has the ability to execute SQL queries against a given JDBC Driver. JMeter allows to define queries to execute against a Hive table.

An instance of JMeter can run multiple threads of queries in parallel, with multiple instances of JMeter capable of spreading clients across many nodes. The queries can also be parameterized with pseudo-random data in order to simulate all types of queries to a table.

JMeter automates the execution of the queries in parallel. The results of the queries that JMeter ran are also aggregated and analyzed together to provide an overall view into the performance. Mean and median are provided for a simple insight, as well as 90th, 95th, 99th and 99.9th percentiles to understand the execution tail.

This approach is extremely useful to execute read-heavy workloads.

JMeter Setup for Hive Load Testing

These steps have been tested on HDP 2.4.2 and OSX and should work similarly on other Unix-like systems.

Step 1. Download, Install and Setup JMeter

Step 2: Building a Database Test Plan

To build a database test plan, consult Instructions to build a database test plan. This example uses MySQL driver, however, the same approach is applicable to Hive. You will have to provide the Hive database URL. Note: If you use Hive2 Server the URL uses hive2 instead of hive. The plan includes adding users, JDBC requests and a listener to view/store the test results

Step 3: Build a Jmeter Dashboard

JMeter supports dashboard report generation to get graphs and statistics from a test plan. To build a dashboard follow Instructions to build a JMeter dashboard.

This dashboard should include a request summary graph showing the success and failed transaction percentage, a statistics table providing a summary of all metrics per transaction including 3 configurable percentiles, an error table providing a summary of all errors and their proportion in the total requests, zoomable chart where you can check/uncheck every transaction to show/hide it for response times over time, bytes throughput over time, latencies over time, hits per second, response codes per second, transactions per second, response time vs Request per second, latency vs Request per second, response times percentiles, active threads over time, times vs threads, response time distribution.

Step 4: Run JMeter

To run JMeter, run jmeter (for Unix) file. These files are found in the bin directory. There are some additional scripts in the bin directory that you may find useful:

jmeter - run JMeter (in GUI mode by default). Defines some JVM settings which may not work for all JVMs.

jmeter-server - start JMeter in server mode (calls jmeter script with appropriate parameters)

jmeter.sh - very basic JMeter script (You may need to adapt JVM options like memory settings).

mirror-server.sh - runs the JMeter Mirror Server in non-GUI mode

shutdown.sh - run the Shutdown client to stop a non-GUI instance gracefully

stoptest.sh - run the Shutdown client to stop a non-GUI instance abruptly

It may be necessary to edit the jmeter shell script if some of the JVM options are not supported by the JVM you are using. The JVM_ARGS environment variable can be used to override or set additional JVM options, which will override the HEAP settings in the script. For example:

JVM_ARGS="-Xms1024m -Xmx1024m" jmeter -t test.jmx [etc.]

Findings

1) I recently executed a Hive Load Test with JMeter and learned that maximum 10 connections are possible by 1 YARN queue. That was a big eye opener since the requirement was for 50 concurrent connections, Obviously, multiple queries can be submitted by connection, but also, as resources are available, multiple YARN queues can be created and used to increase number of connections.

2) Creating multiple queues to meet the 50 concurrent connections requirement lead to another finding, response time and scalability were impacted dramatically. If you assume N connections and M number of concurrent executions by connection, the more connections, the more overhead, more resources to do less and slower, For example, assume N = 10 and M =1000. That would be 10,000 concurrent queries. For N = 1 and M = 10000 that would be also 10000 concurrent queries. The same queries with the same resources allocated overall had a significantly better response time for lesser queues. As such, unless there is another reason for multiple queues, my advice would be to limit one queue for a tenant application and limit number of connections per queue as such to reuse one that is already open and better use existent resources.

3) Always question requirements for so many open connections. There is always a proxy application that can use a single connection to serve multiple requests from multiple users.

10,852 Views
Comments
avatar
New Contributor

Does this work for a Kerberized layer too?