Support Questions

JB0000000000001 · ‎03-02-2018

Dear community,

I notice the exam on the CCA175 will have spark version 1.6. One of the main topics of the exam is data analysis using spark SQL. I notice that the functionalities to load a dataframe into a format that can be used to perform sql queries, only exist since spark version >1.6 (e.g. registerTempTable or createorreplacetempview).

ANy thoughts on this? I am surprised that such an outdated version of spark is used for the exam.

Best to all!

Report Inappropriate Content · ‎03-05-2018

It would be beneficial for you if you can go through Spark 1.6 documentation fully, in order to understand what is available and what is not. For your questions, see below.

1. sqlContext.createDataFrame(your_query)
    rdd.toDF(schema)

2. df.registerTempTable(table_name)

3. sqlContext.sql(your_query)

sqlContext.read.table(your_hive_table)

View solution in original post

Report Inappropriate Content · ‎03-03-2018

You may be having an incorrect understanding when you say

I notice that the functionalities to load a dataframe into a format that can be used to perform sql queries, only exist since spark version >1.6

Can you expand on it with an example?

JB0000000000001 · ‎03-05-2018

I was referring to the following which is not available yet in spark 1.6 :

1)create a DF

2)create a table to write direct sql queries on: df.createGlobalTempView("people")

3)query on this table : spark.sql("SELECT * FROM global_temp.people")

But I think what is required for the section "data analysis: use spark sql to interact with the metastore programmatically in your application" is to create a SQL/HiveContext and then query on tables that are already stored in the HIVE metastore. ANy idea if this is correct?

Report Inappropriate Content · ‎03-05-2018

It would be beneficial for you if you can go through Spark 1.6 documentation fully, in order to understand what is available and what is not. For your questions, see below.

1. sqlContext.createDataFrame(your_query)
    rdd.toDF(schema)

2. df.registerTempTable(table_name)

3. sqlContext.sql(your_query)

sqlContext.read.table(your_hive_table)

Cloudera Community

Support Questions

CCA 175 : spark version 1.6 on exam: