- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Is sharing spark RDD or context a supported in HDP 2.5?
- Labels:
-
Apache Spark
Created ‎09-22-2016 04:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is sharing spark RDD or context a supported in HDP 2.5 via livy server? Everything I see is via zeppelin interrupter to livy. I want to know if strictly using spark is sharing spark (not zeppelin) RDD or context supported.
Created ‎09-22-2016 04:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Sunile Manjee it is not supported by HDP 2.5 , confirmed that yesterday with @vshukla
Created ‎09-22-2016 04:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Sunile Manjee it is not supported by HDP 2.5 , confirmed that yesterday with @vshukla
Created ‎09-23-2016 01:25 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@azeltov does that include with livy server?
Created ‎09-26-2016 03:49 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Correct Livy server is only supported as zeppelin integration, not direct REST api call to Livy.
Created ‎09-22-2016 06:27 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Consider how Spark applications run: a driver runs either on the client, or in a YARN container. If multiple users will ask the same Spark application instance to do multiple things, they need an interface to communicate that to the Driver.
Livy is the out of the box REST interface that shares a single Spark application by presenting the control interface to external users.
If you do not want to use Livy, but still want to share a Spark context, you need to build an external means of communicating with the shared Driver. One solution might be to have the driver periodically pull new queries from a database or from files on disk. This functionality is not builtin to Spark, but could be implemented with a while loop and a sleep statement.
Created ‎09-22-2016 07:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
*Edit* Realistically, questions about shared SparkContexts are often about
1. Making shared use of cached DataFrames/DataSets
Livy and the Spark Thrift JDBC/ODBC server are decent initial solutions. Keep an eye on Spark-LLAP integration which will be better all around (security, efficiency, etc.)
2. Problems with Spark applications consuming all of a cluster's resources.
Spark's ability to spin up and spin down executor instances dynamically based on utilization is probably a better solution to this problem than sharing a single spark context.
Created ‎10-31-2016 06:02 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does ThriftServer in HDP support sharing RDD today?
Created ‎09-23-2016 01:27 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Randy Gelhausen Is spark RDD & context sharing supported in 2.5 via livy server?
Created ‎09-30-2016 07:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yes
Created ‎09-23-2016 05:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No without Livy. Yes with Livy (@vshukla). However, it is exposed only to Zeppelin, for now.
Code examples: https://github.com/romainr/hadoop-tutorials-examples/tree/master/notebook/shared_rdd
