Support Questions

Find answers, ask questions, and share your expertise

CCA 175 : spark version 1.6 on exam:

avatar
Contributor

Dear community,

I notice the exam on the CCA175 will have spark version 1.6. One of the main topics of the exam is data analysis using spark SQL. I notice that the functionalities to load a dataframe into a format that can be used to perform sql queries, only exist since spark version >1.6 (e.g. registerTempTable or createorreplacetempview).


ANy thoughts on this? I am surprised that such an outdated version of spark is used for the exam.

 

Best to all!

 

1 ACCEPTED SOLUTION

avatar

 

It would be beneficial for you if you can go through Spark 1.6 documentation fully, in order to understand what is available and what is not.  For your questions, see below.

 

1. sqlContext.createDataFrame(your_query)
    rdd.toDF(schema)

2. df.registerTempTable(table_name)

3. sqlContext.sql(your_query)

sqlContext.read.table(your_hive_table)

 

View solution in original post

3 REPLIES 3

avatar

You may be having an incorrect understanding when you say

 

I notice that the functionalities to load a dataframe into a format that can be used to perform sql queries, only exist since spark version >1.6

 

Can you expand on it with an example?

avatar
Contributor

I was referring to the following which is not available yet in spark 1.6 : 

1)create a DF

2)create a table to write direct sql queries on: df.createGlobalTempView("people")

3)query on this table : spark.sql("SELECT * FROM global_temp.people")

 

But I think what is required for the section "data analysis: use spark sql to interact with the metastore programmatically in your application" is to create a SQL/HiveContext and then query on tables that are already stored in the HIVE metastore. ANy idea if this is correct?

 

 

avatar

 

It would be beneficial for you if you can go through Spark 1.6 documentation fully, in order to understand what is available and what is not.  For your questions, see below.

 

1. sqlContext.createDataFrame(your_query)
    rdd.toDF(schema)

2. df.registerTempTable(table_name)

3. sqlContext.sql(your_query)

sqlContext.read.table(your_hive_table)