Support Questions

aervits · ‎05-02-2016

how does context pass from paragraph to paragraph? Think hive context shared with Spark, then phoenix, etc. Also is context sharing enabled for multi-user?

azeltov · ‎05-02-2016

@Artem Ervits you do not think of context passing between spark, phoenix, hive. You would load data as Dataframe/Dataset into local variable from your datasource, you would do this for every datasource. Example:

val mysqlTableDF = hiveContext.read.format("jdbc")....load(); //load a mysql table
val csvDF = hiveContext.read.format("com.databricks.spark.csv") ...load() //load a csv file

and than you would work with those DataFrames and do joins , filters, etc. ex:

val joined_df = hiveTablesDF.join(factsales,"key")

For Context sharing Sunile is right , Vadim created an article on HCC that gives more details. But the short version if you want to share context:

Log into Amabri as admin

Click on the Spark service in the left hand pane

Click on Configs
Click on the "Custom spark-defaults"
Add a custom property key=spark.sql.hive.thriftServer.singleSession value=true

Note this is only required in Spark 1.6 , 1.5 you had automatic context sharing.

View solution in original post

sunile_manjee · ‎05-02-2016

@Artem Ervits I nice article by vadim was created on HCC here. Hope this helps.

azeltov · ‎05-02-2016

@Artem Ervits you do not think of context passing between spark, phoenix, hive. You would load data as Dataframe/Dataset into local variable from your datasource, you would do this for every datasource. Example:

val mysqlTableDF = hiveContext.read.format("jdbc")....load(); //load a mysql table
val csvDF = hiveContext.read.format("com.databricks.spark.csv") ...load() //load a csv file

and than you would work with those DataFrames and do joins , filters, etc. ex:

val joined_df = hiveTablesDF.join(factsales,"key")

For Context sharing Sunile is right , Vadim created an article on HCC that gives more details. But the short version if you want to share context:

Log into Amabri as admin

Click on the Spark service in the left hand pane

Click on Configs
Click on the "Custom spark-defaults"
Add a custom property key=spark.sql.hive.thriftServer.singleSession value=true

Note this is only required in Spark 1.6 , 1.5 you had automatic context sharing.

rchoudhary · ‎06-02-2016

@Artem Ervits, context sharing in Spark just got better with the latest Tech preview of Zeppelin which is Livy integrated - https://hortonworks.com/hadoop-tutorial/apache-zeppelin-hdp-2-4-2/. Livy acts both as a Job server, and in addition enables multi-user scenarios, allowing the users to latch on to an existing session.

Cloudera Community

Support Questions

How does context sharing work in Spark and Zepplin?

Providing tracing with Spark Caller Context

How to avoid sharing spark context between two Zep...

Is sharing spark RDD or context a supported in HDP...

cdsw spark context issue

Is there any documentation on how livy enables sha...

Support Video: How does Kafka ACLs work?

livy2 zepplin issue

ExecuteSQL dynamic query does not work but no erro...

SPARK Throwing error while using pyspark on sql co...

Re: Nifi misses parameter context inhertance after...