Created on 09-24-201701:35 PM - edited 08-17-201911:05 AM
Spark Load testing
framework built on a number of distributed technologies, including Gatling, Livy,
Akka, and HDP. Using Akka Server powered by LIVY {Spark as a Service} provides
the following benefits.
REST
friendly and Docker Friendly
Low latency
execution
Sharing
cache across jobs
Separation
of concern
Multi
tenancy
Direct
Spark SQL execution
Configuration
at one place
Auditing
and Logging
Complete statement
history and metrics
Livy Server
Livy is an open source REST interface for interacting with
Apache Spark from anywhere. It supports executing snippets of code or programs
in a Spark context that runs locally or in Apache Hadoop YARN.
Livy offers three modes to run Spark jobs:
Using programmatic
API
Running interactive
statements through REST API
Submitting batch
applications with REST API
Livy provides the following features:
Interactive Scala,
Python, and R shells
Batch submissions in
Scala, Java, Python
Multiple users can
share the same server (impersonation support)
Can be used for
submitting jobs from anywhere with REST
Does not require any
code change to your programs
Support Spark1/
Spark2, Scala 2.10/2.11 within one build.
Livy provides the
following advantages:
Programmatically
upload jar file and run job. Add additional applications that will connect to
same cluster and upload jar with next job. If you use spark-submit, you must
upload manually JAR file to cluster and run command. Everything must be prepared
before run
Use Spark in
interactive mode, hard to do with spark-submit or Thrift Server at scale.
Security. Reduce
exposure of the cluster to the outside world.
Stability. Spark is
a complex framework and there many factors which can affect its long term
performance and stability. Decoupling Spark context and application allows to
handle Spark issues gracefully, without full downtime of the application.
Gatling Server
Gatling is a highly capable load testing tool. It is designed
for ease of use, maintainability and high performance. Gatling server provides
the following benefits.
Powerful scripting using Scala
Akka + Netty
Run multiple scenarios in one
simulation
Scenarios = code + DSL
Graphical reports with clear
& concise graphs
Gatling’s architecture is asynchronous as
long as the underlying protocol, such as HTTP, can be implemented in a non
blocking way. This kind of architecture lets us implement virtual users as
messages instead of dedicated threads, making them very resource cheap. Thus,
running thousands of concurrent virtual users is not an issue.
val theScenarioBuilder =
scenario("Interactive Spark Command Scenario Using LIVY Rest Services $sessionId").exec(
/* myRequest1 is a name that describes the request. */
http("Interactive Spark Command Simulation")
.get("/insrun?sessionId=${sessionId}&statement=sparkSession.sql(%22%20select%20event.site_id%20from%20siteexposure_event%20as%20event%20where%20st_intersects(st_makeBBOX(${bbox})%2C%20geom)%20limit%205%20%22).show").check()
).pause(4 second)
So, this is great, we can load
test our spark interactive command with one user! Let’s increase the number of
users.
To increase the number of
simulated users, all you have to do is to change the configuration of the
simulation as follows:
If you want to simulate 3000
users, you might not want them to start at the same time. Indeed, real users
are more likely to connect to your web application gradually.
Gatling provides rampUsers to implement this behavior. The
value of the ramp indicates the duration over which the users will be linearly
started. In our scenario let’s have 10 regular users ramp them over 10 seconds
so we don’t hammer the Livy server:
setUp(
theScenarioBuilder.inject(rampUsers(10) over (10 seconds)),
).protocols(theHttpProtocolBuilder)