Reply
Highlighted
New Contributor
Posts: 3
Registered: ‎01-17-2018

DISTRIBUTE BY and CLUSTER BY Not Supported in spark sql 1.6 cdh 5.7.0

 am using spark 1.6 and and trying to optimize my joins by following these blogs https://docs.cloud.databricks.com/docs/latest/databricks_guide/04%20SQL,%20DataFrames%20&%20Datasets... and https://blog.deepsense.ai/optimize-spark-with-distribute-by-and-cluster-by/ using DISTRIBUTE BY and CLUSTER BY , but unfortunately they are not supported.

My spark sql query is

 

sqlContext.sql(
      """select b.*, count(*) AS CNT  from tableb b
         GROUP BY b.Key,b.KeyVal
         CLUSTER BY b.Key,b.KeyVal
      """)

Error is

Exception in thread "main" java.lang.RuntimeException: [5.7] failure: ``union'' expected but identifier CLUSTER found

      CLUSTER BY b.Key 
Announcements