Support Questions

Find answers, ask questions, and share your expertise

How does context sharing work in Spark and Zepplin?

avatar
Master Mentor

how does context pass from paragraph to paragraph? Think hive context shared with Spark, then phoenix, etc. Also is context sharing enabled for multi-user?

1 ACCEPTED SOLUTION

avatar

@Artem Ervits you do not think of context passing between spark, phoenix, hive. You would load data as Dataframe/Dataset into local variable from your datasource, you would do this for every datasource. Example:

val mysqlTableDF = hiveContext.read.format("jdbc")....load(); //load a mysql table
val csvDF = hiveContext.read.format("com.databricks.spark.csv") ...load() //load a csv file

and than you would work with those DataFrames and do joins , filters, etc. ex:

val joined_df = hiveTablesDF.join(factsales,"key")

For Context sharing Sunile is right , Vadim created an article on HCC that gives more details. But the short version if you want to share context:

Log into Amabri as admin

Click on the Spark service in the left hand pane

  • Click on Configs
  • Click on the "Custom spark-defaults"
  • Add a custom property key=spark.sql.hive.thriftServer.singleSession value=true

Note this is only required in Spark 1.6 , 1.5 you had automatic context sharing.

View solution in original post

3 REPLIES 3

avatar
Master Guru

@Artem Ervits I nice article by vadim was created on HCC here. Hope this helps.

avatar

@Artem Ervits you do not think of context passing between spark, phoenix, hive. You would load data as Dataframe/Dataset into local variable from your datasource, you would do this for every datasource. Example:

val mysqlTableDF = hiveContext.read.format("jdbc")....load(); //load a mysql table
val csvDF = hiveContext.read.format("com.databricks.spark.csv") ...load() //load a csv file

and than you would work with those DataFrames and do joins , filters, etc. ex:

val joined_df = hiveTablesDF.join(factsales,"key")

For Context sharing Sunile is right , Vadim created an article on HCC that gives more details. But the short version if you want to share context:

Log into Amabri as admin

Click on the Spark service in the left hand pane

  • Click on Configs
  • Click on the "Custom spark-defaults"
  • Add a custom property key=spark.sql.hive.thriftServer.singleSession value=true

Note this is only required in Spark 1.6 , 1.5 you had automatic context sharing.

avatar
New Contributor

@Artem Ervits, context sharing in Spark just got better with the latest Tech preview of Zeppelin which is Livy integrated - https://hortonworks.com/hadoop-tutorial/apache-zeppelin-hdp-2-4-2/. Livy acts both as a Job server, and in addition enables multi-user scenarios, allowing the users to latch on to an existing session.