Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Issue with WITH clause with Cloudera JDBC Driver for Impala - Returning column name instead of Data

Highlighted

Issue with WITH clause with Cloudera JDBC Driver for Impala - Returning column name instead of Data

New Contributor

I am using Cloudera JDBC Driver for Impala v 2.5.38 with Spark 1.6.0 to create DataFrame. It is working fine for all queries except WITH clause, but WITH is extensively used in my organization. Below is my code snippet.

def jdbcHDFS(url:String,sql: String):DataFrame = {
  var rddDF: DataFrame = null
  val jdbcURL = s"jdbc:impala://$url"
  val connectionProperties = new java.util.Properties
  connectionProperties.setProperty("driver","com.cloudera.impala.jdbc41.Driver")
  rddDF = sqlContext.read.jdbc(jdbcURL, s"($sql) AS ST", connectionProperties)
  rddDF
}

Given below example for working and non-working SQL

val workingSQL = "select empname from (select * from employee) as tmp"
val nonWorkingSQL = "WITH tmp as (select * from employee) select empname from tmp"

Below is the output of rddDF.first for above SQLs.

For workingSQL

scala> rddDF.first
res8: org.apache.spark.sql.Row = [Kushal]

For nonWorkingSQL

scala> rddDF.first
res8: org.apache.spark.sql.Row = [empname] //Here we are expecting actual data ie. 'Kushal' instead of column name like the output of previous query.

Please note: Both the queries are working fine in IMPALA-SHELL as well as in HIVE through HUE.

I have used UseNativeQuery=1 with no luck

 

It would be really helpful if anyone can suggest any solution for it.

4 REPLIES 4

Re: Issue with WITH clause with Cloudera JDBC Driver for Impala - Returning column name instead of D

Cloudera Employee

I don't think there's an issue with the JDBC driver or Impala here. I wasn't able to reproduce the problem. It may be an issue with the Spark code.

 

This code produced the correct result:

 

package com.cloudera.impala.sample;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

public class ClouderaImpalaJdbcExample {

  private static final String IMPALAD_HOST = "localhost";
  private static final String IMPALAD_JDBC_PORT = "21050";
  private static final String CONNECTION_URL = "jdbc:impala://" + IMPALAD_HOST + ':' + IMPALAD_JDBC_PORT + "/;auth=noSasl";
  private static final String JDBC_DRIVER_NAME = "com.cloudera.impala.jdbc41.Driver";

  public static void main(String[] args) {
    System.out.println("Using Connection URL: " + CONNECTION_URL);
    Connection con = null;
    try {
      Class.forName(JDBC_DRIVER_NAME);
      con = DriverManager.getConnection(CONNECTION_URL);
      final Statement stmt = con.createStatement();
      stmt.execute("with tmp as (select * from functional.alltypestiny) select string_col from tmp;");
      ResultSet rs = stmt.getResultSet();
      while (rs.next()) System.out.println(rs.getString(1));
      stmt.close();
    } catch (SQLException e) {
      e.printStackTrace();
    } catch (Exception e) {
      e.printStackTrace();
    } finally {
      try {
        con.close();
      } catch (Exception e) {
        // swallow
      }
    }
  }
}

Re: Issue with WITH clause with Cloudera JDBC Driver for Impala - Returning column name instead of D

New Contributor

Hi Matt,
Thanks for your response, I have post my spark code in the post itself and now i am confused if there is some issue with spark code then how other queries would work as expected.
I request you to try to re-produce it using spark and meanwhile i will try your code in my environment and post the result.
Looking forward to hearing from you soon.

Re: Issue with WITH clause with Cloudera JDBC Driver for Impala - Returning column name instead of D

New Contributor

Hi Matt,

I have tried to setup plain JDBC connection and execute the nonWorkingSQL in my environment and it woked!

Then i thought the issue is due to Spark wraps a "SELECT * FROM ( )" around the query, hence i tried the below SQL to find the root cause but still it worked and displayed expected result.

SELECT * FROM (WITH tmp as (select * from employee) select empname from tmp) AS ST

Is the analysis helpful which let you suggest anything?

Re: Issue with WITH clause with Cloudera JDBC Driver for Impala - Returning column name instead of D

New Contributor
I am having the same problem. When I am runnin spark jdbc with analytical function such as lead, lag in impala. It's returning column name as data. Could you please let us know how to resolve this one
Don't have an account?
Coming from Hortonworks? Activate your account here