SparkSQL just exited alpha and is far from stable. As such, SparkSQL is currently considered a “preview” in CDH. We love it and we’re dedicating a lot of engineering resources to bring it to our standards but as I’m sure you’re aware, it’s mainly Scala (pyspark lags), it’s very buggy, it causes all kinds of havoc (esp. with Hive)….the list goes on.
Once we get it running at scale, we’ll support it fully in our distribution and we’ll test it. But today, it’s just not ready.