We have recently started project on big data platform . We are using CDH 5.4.1 as environment to run this.
I have a understanding that Spark SQL is not supported yet . so would like to understand why we are not supporting this , is there any risk in using this
considering it is not supported.
We do not want to reinvent the wheel. if we know specific use case which can't work we can look for alternative.
We are intent to use Spark/Hive/Parquet/Avro and HiveCotext from Spark SQL. as Spark SQL is not supported , we need to understand risk around it as I
bbelieve we are plannign to support from 2015 end or early december
Spark SQL is not suported because it is still in flux, there is a lot that changes from release to release and it is not stable enough for us to consider it as supportable.
We can not say when it will be supported as that woud depend on the progress that is being made in the Spark project. I think it has been discussed before on this mailing list but here it is again: Spark uses an older version of Hive then CDH which has an impact on the SparkSQL side. See the documentation for more information on that.
Hive on Spark is in a beta release at the moment and will become part of the supported products soon.
As a Spark user, if we need additional feature of SQL on top of our data sitting on HDFS and Spark Infrastructure.
What is the recomendation for that, Spark SQL fits well with existing technology stacks exposing SPark DataFrames out as SQLs.
When could we see Cloudera supporting Spark SQL as well?
CDH has always included Spark SQL with Spark, and it works as well as Spark SQL can be made to work with the rest of the distribution, but there are no announced plans to support Spark SQL that I know of.
Does that mean the hive thrift service running spark sql is also supported?
And when can we see Hive On Spark with full features authorisation etc.?
No we do not support the thrift server as per the documentation: CDH 5.5 Spark release note
Hive on Spark is also still in beta and we are finishing features as per Hive CDH 5.5 release note it is thus experimental and things might not work. We can not provide guidance on the road map for features that are not yet complete