Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

distinct/group by fails in spark app for sparksql catalog

Highlighted

distinct/group by fails in spark app for sparksql catalog

New Contributor

In HDP3.1, when I run the this method in a spark app via spark-submit:

       public static Dataset<Row> sql(SparkSession spark, String sql) {
              return spark.sql("select distinct col1, col2 from tab group by col1, col2");
       }

I get the following exception:

Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: 1:72 SELECT DISTINCT and
GROUP BY can not be in the same query. Error encountered near token 'col2'
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:172)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:186)     
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:186)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:1215
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:1225
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:36
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:28    
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:66)       
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1869)       
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1816)       
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1811)       
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)       
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:197)       
... 36 more

However, the same query in spark-shell

scala> sql("select distinct col1, col2 from tab group by col1, col2").show()

Works just fine. This are for tables created with createOrReplaceTempView()

Is there some configuration/options that enables or suppresses the distinct/group-by error?