Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Expert Contributor

Spark Load testing framework built on a number of distributed technologies, including Gatling, Livy, Akka, and HDP. Using Akka Server powered by LIVY {Spark as a Service} provides the following benefits.

  • REST friendly and Docker Friendly
  • Low latency execution
  • Sharing cache across jobs
  • Separation of concern
  • Multi tenancy
  • Direct Spark SQL execution
  • Configuration at one place
  • Auditing and Logging
  • Complete statement history and metrics

39502-load-test.png

Livy Server

Livy is an open source REST interface for interacting with Apache Spark from anywhere. It supports executing snippets of code or programs in a Spark context that runs locally or in Apache Hadoop YARN.

Livy offers three modes to run Spark jobs:

  • Using programmatic API
  • Running interactive statements through REST API
  • Submitting batch applications with REST API

Livy provides the following features:

  • Interactive Scala, Python, and R shells
  • Batch submissions in Scala, Java, Python
  • Multiple users can share the same server (impersonation support)
  • Can be used for submitting jobs from anywhere with REST
  • Does not require any code change to your programs
  • Support Spark1/ Spark2, Scala 2.10/2.11 within one build.

Livy provides the following advantages:

  • Programmatically upload jar file and run job. Add additional applications that will connect to same cluster and upload jar with next job. If you use spark-submit, you must upload manually JAR file to cluster and run command. Everything must be prepared before run
  • Use Spark in interactive mode, hard to do with spark-submit or Thrift Server at scale.
  • Security. Reduce exposure of the cluster to the outside world.
  • Stability. Spark is a complex framework and there many factors which can affect its long term performance and stability. Decoupling Spark context and application allows to handle Spark issues gracefully, without full downtime of the application.

Gatling Server

Gatling is a highly capable load testing tool. It is designed for ease of use, maintainability and high performance. Gatling server provides the following benefits.

  • Powerful scripting using Scala
  • Akka + Netty
  • Run multiple scenarios in one simulation
  • Scenarios = code + DSL
  • Graphical reports with clear & concise graphs

Gatling’s architecture is asynchronous as long as the underlying protocol, such as HTTP, can be implemented in a non blocking way. This kind of architecture lets us implement virtual users as messages instead of dedicated threads, making them very resource cheap. Thus, running thousands of concurrent virtual users is not an issue.

val theScenarioBuilder =
    scenario("Interactive Spark Command Scenario Using LIVY Rest Services $sessionId").exec(
        /* myRequest1 is a name that describes the request. */
        http("Interactive Spark Command Simulation")
.get("/insrun?sessionId=${sessionId}&statement=sparkSession.sql(%22%20select%20event.site_id%20from%20siteexposure_event%20as%20event%20where%20st_intersects(st_makeBBOX(${bbox})%2C%20geom)%20limit%205%20%22).show").check()      
).pause(4 second)

So, this is great, we can load test our spark interactive command with one user! Let’s increase the number of users.

To increase the number of simulated users, all you have to do is to change the configuration of the simulation as follows:

setUp(
    theScenarioBuilder.inject(atOnceUsers(10))
    ).protocols(theHttpProtocolBuilder)

If you want to simulate 3000 users, you might not want them to start at the same time. Indeed, real users are more likely to connect to your web application gradually.

Gatling provides rampUsers to implement this behavior. The value of the ramp indicates the duration over which the users will be linearly started. In our scenario let’s have 10 regular users ramp them over 10 seconds so we don’t hammer the Livy server:

 setUp(
    theScenarioBuilder.inject(rampUsers(10) over (10 seconds)),
  ).protocols(theHttpProtocolBuilder)
6,222 Views