Created 09-22-2016 04:12 PM
Is sharing spark RDD or context a supported in HDP 2.5 via livy server? Everything I see is via zeppelin interrupter to livy. I want to know if strictly using spark is sharing spark (not zeppelin) RDD or context supported.
Created 09-22-2016 04:24 PM
@Sunile Manjee it is not supported by HDP 2.5 , confirmed that yesterday with @vshukla
Created 09-22-2016 04:24 PM
@Sunile Manjee it is not supported by HDP 2.5 , confirmed that yesterday with @vshukla
Created 09-23-2016 01:25 AM
@azeltov does that include with livy server?
Created 09-26-2016 03:49 PM
Correct Livy server is only supported as zeppelin integration, not direct REST api call to Livy.
Created 09-22-2016 06:27 PM
Consider how Spark applications run: a driver runs either on the client, or in a YARN container. If multiple users will ask the same Spark application instance to do multiple things, they need an interface to communicate that to the Driver.
Livy is the out of the box REST interface that shares a single Spark application by presenting the control interface to external users.
If you do not want to use Livy, but still want to share a Spark context, you need to build an external means of communicating with the shared Driver. One solution might be to have the driver periodically pull new queries from a database or from files on disk. This functionality is not builtin to Spark, but could be implemented with a while loop and a sleep statement.
Created 09-22-2016 07:33 PM
*Edit* Realistically, questions about shared SparkContexts are often about
1. Making shared use of cached DataFrames/DataSets
Livy and the Spark Thrift JDBC/ODBC server are decent initial solutions. Keep an eye on Spark-LLAP integration which will be better all around (security, efficiency, etc.)
2. Problems with Spark applications consuming all of a cluster's resources.
Spark's ability to spin up and spin down executor instances dynamically based on utilization is probably a better solution to this problem than sharing a single spark context.
Created 10-31-2016 06:02 AM
Does ThriftServer in HDP support sharing RDD today?
Created 09-23-2016 01:27 AM
@Randy Gelhausen Is spark RDD & context sharing supported in 2.5 via livy server?
Created 09-30-2016 07:23 PM
yes
Created 09-23-2016 05:05 PM
No without Livy. Yes with Livy (@vshukla). However, it is exposed only to Zeppelin, for now.
Code examples: https://github.com/romainr/hadoop-tutorials-examples/tree/master/notebook/shared_rdd