Created 05-02-2016 07:52 PM
how does context pass from paragraph to paragraph? Think hive context shared with Spark, then phoenix, etc. Also is context sharing enabled for multi-user?
Created 05-02-2016 08:11 PM
@Artem Ervits you do not think of context passing between spark, phoenix, hive. You would load data as Dataframe/Dataset into local variable from your datasource, you would do this for every datasource. Example:
val mysqlTableDF = hiveContext.read.format("jdbc")....load(); //load a mysql table val csvDF = hiveContext.read.format("com.databricks.spark.csv") ...load() //load a csv file
and than you would work with those DataFrames and do joins , filters, etc. ex:
val joined_df = hiveTablesDF.join(factsales,"key")
For Context sharing Sunile is right , Vadim created an article on HCC that gives more details. But the short version if you want to share context:
Log into Amabri as admin
Click on the Spark service in the left hand pane
Note this is only required in Spark 1.6 , 1.5 you had automatic context sharing.
Created 05-02-2016 07:55 PM
@Artem Ervits I nice article by vadim was created on HCC here. Hope this helps.
Created 05-02-2016 08:11 PM
@Artem Ervits you do not think of context passing between spark, phoenix, hive. You would load data as Dataframe/Dataset into local variable from your datasource, you would do this for every datasource. Example:
val mysqlTableDF = hiveContext.read.format("jdbc")....load(); //load a mysql table val csvDF = hiveContext.read.format("com.databricks.spark.csv") ...load() //load a csv file
and than you would work with those DataFrames and do joins , filters, etc. ex:
val joined_df = hiveTablesDF.join(factsales,"key")
For Context sharing Sunile is right , Vadim created an article on HCC that gives more details. But the short version if you want to share context:
Log into Amabri as admin
Click on the Spark service in the left hand pane
Note this is only required in Spark 1.6 , 1.5 you had automatic context sharing.
Created 06-02-2016 01:37 PM
@Artem Ervits, context sharing in Spark just got better with the latest Tech preview of Zeppelin which is Livy integrated - https://hortonworks.com/hadoop-tutorial/apache-zeppelin-hdp-2-4-2/. Livy acts both as a Job server, and in addition enables multi-user scenarios, allowing the users to latch on to an existing session.