Support Questions

Find answers, ask questions, and share your expertise

Spark SQL?

avatar

Do we need to know Spark SQL for the CCA Spark and Hadoop certi?

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Support for Spark SQL is being added into CDH5.5  As of today, the exam is running on CDH5.3.2

So the answer is "not yet", but that will almost certainly change in the near future.

 

Watch the Cloudera website:  http://www.cloudera.com/training/certification/cca-spark.html

The list of required skills should give you knowledge of what technologies you will need to know.

View solution in original post

3 REPLIES 3

avatar
Super Collaborator

Support for Spark SQL is being added into CDH5.5  As of today, the exam is running on CDH5.3.2

So the answer is "not yet", but that will almost certainly change in the near future.

 

Watch the Cloudera website:  http://www.cloudera.com/training/certification/cca-spark.html

The list of required skills should give you knowledge of what technologies you will need to know.

avatar
Explorer

CDH 5.3.0 ships with Spark 1.2.0 which in turn ships with support for Spark SQL. So I guess all CDH >= 5.3.0 must support Spark SQL. Unless CDH explicitly comes without Spark SQL support...

 

See http://spark.apache.org/docs/1.2.1/sql-programming-guide.html

avatar
Master Collaborator
That’s not correct. Please see the release notes
http://www.cloudera.com/documentation/enterprise/5-3-x/topics/cdh_rn_spark_ki.html

SparkSQL just exited alpha and is far from stable. As such, SparkSQL is
currently considered a “preview” in CDH. We love it and we’re
dedicating a lot of engineering resources to bring it to our standards
but as I’m sure you’re aware, it’s mainly Scala (pyspark lags),
it’s very buggy, it causes all kinds of havoc (esp. with Hive)….the
list goes on.

Once we get it running at scale, we’ll support it fully in our
distribution and we’ll test it. But today, it’s just not ready.