03-02-2018 01:03 AM
I notice the exam on the CCA175 will have spark version 1.6. One of the main topics of the exam is data analysis using spark SQL. I notice that the functionalities to load a dataframe into a format that can be used to perform sql queries, only exist since spark version >1.6 (e.g. registerTempTable or createorreplacetempview).
ANy thoughts on this? I am surprised that such an outdated version of spark is used for the exam.
Best to all!
03-03-2018 11:22 AM
You may be having an incorrect understanding when you say
I notice that the functionalities to load a dataframe into a format that can be used to perform sql queries, only exist since spark version >1.6
Can you expand on it with an example?
03-05-2018 12:06 AM
I was referring to the following which is not available yet in spark 1.6 :
1)create a DF
2)create a table to write direct sql queries on: df.createGlobalTempView("people")
3)query on this table : spark.sql("SELECT * FROM global_temp.people")
But I think what is required for the section "data analysis: use spark sql to interact with the metastore programmatically in your application" is to create a SQL/HiveContext and then query on tables that are already stored in the HIVE metastore. ANy idea if this is correct?
03-05-2018 01:59 AM
It would be beneficial for you if you can go through Spark 1.6 documentation fully, in order to understand what is available and what is not. For your questions, see below.
1. sqlContext.createDataFrame(your_query) rdd.toDF(schema) 2. df.registerTempTable(table_name) 3. sqlContext.sql(your_query)