Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

DISTRIBUTE BY and CLUSTER BY Not Supported in spark sql 1.6 cdh 5.7.0

Highlighted

DISTRIBUTE BY and CLUSTER BY Not Supported in spark sql 1.6 cdh 5.7.0

New Contributor
 am using spark 1.6 and and trying to optimize my joins by following these blogs https://docs.cloud.databricks.com/docs/latest/databricks_guide/04%20SQL,%20DataFrames%20&%20Datasets... and https://blog.deepsense.ai/optimize-spark-with-distribute-by-and-cluster-by/ using DISTRIBUTE BY and CLUSTER BY , but unfortunately they are not supported.

My spark sql query is

 

sqlContext.sql(
      """select b.*, count(*) AS CNT  from tableb b
         GROUP BY b.Key,b.KeyVal
         CLUSTER BY b.Key,b.KeyVal
      """)

Error is

Exception in thread "main" java.lang.RuntimeException: [5.7] failure: ``union'' expected but identifier CLUSTER found

      CLUSTER BY b.Key