Support Questions
Find answers, ask questions, and share your expertise

Is sharing spark RDD or context a supported in HDP 2.5?

Super Guru

Is sharing spark RDD or context a supported in HDP 2.5 via livy server? Everything I see is via zeppelin interrupter to livy. I want to know if strictly using spark is sharing spark (not zeppelin) RDD or context supported.

1 ACCEPTED SOLUTION

Accepted Solutions

@Sunile Manjee it is not supported by HDP 2.5 , confirmed that yesterday with @vshukla

View solution in original post

13 REPLIES 13

@Sunile Manjee it is not supported by HDP 2.5 , confirmed that yesterday with @vshukla

View solution in original post

Super Guru

@azeltov does that include with livy server?

Correct Livy server is only supported as zeppelin integration, not direct REST api call to Livy.

@Sunile Manjee

Consider how Spark applications run: a driver runs either on the client, or in a YARN container. If multiple users will ask the same Spark application instance to do multiple things, they need an interface to communicate that to the Driver.

Livy is the out of the box REST interface that shares a single Spark application by presenting the control interface to external users.

If you do not want to use Livy, but still want to share a Spark context, you need to build an external means of communicating with the shared Driver. One solution might be to have the driver periodically pull new queries from a database or from files on disk. This functionality is not builtin to Spark, but could be implemented with a while loop and a sleep statement.

*Edit* Realistically, questions about shared SparkContexts are often about

1. Making shared use of cached DataFrames/DataSets

Livy and the Spark Thrift JDBC/ODBC server are decent initial solutions. Keep an eye on Spark-LLAP integration which will be better all around (security, efficiency, etc.)

2. Problems with Spark applications consuming all of a cluster's resources.

Spark's ability to spin up and spin down executor instances dynamically based on utilization is probably a better solution to this problem than sharing a single spark context.

Expert Contributor

Does ThriftServer in HDP support sharing RDD today?

Super Guru

@Randy Gelhausen Is spark RDD & context sharing supported in 2.5 via livy server?

yes

@Sunile Manjee

No without Livy. Yes with Livy (@vshukla). However, it is exposed only to Zeppelin, for now.

Code examples: https://github.com/romainr/hadoop-tutorials-examples/tree/master/notebook/shared_rdd