Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to set the execution engine to spark when accessing Cloudera Hive via JDBC

How to set the execution engine to spark when accessing Cloudera Hive via JDBC

New Contributor

I cannot set the execution engine for hive in a script executed via jdbc. When the same script is execute via hue web front end the script will set the execution engine as expected but not if run via jdbc

 

I am trying to run hivescripts against hive with the spark engine via a java application:

 

Example of the script

 

set hive.execution.engine=spark;

SELECT * from ......

I have tried executing an actual script in the classpath, i have also tried to send a string representing the sql script via jdbc as noted above.

I have also tried to include the following in the datasource connectionProperties with a factory class that create the hiveTemplate:

 

public static HiveTemplate createHiveTemplate(HiveExecutionEngine engine) {

    Properties props=new Properties();

    switch (engine) {
        case MAP_REDUCE:
            props.setProperty("hive.execution.engine", "mr");
            props.setProperty("mapreduce.map.memory.mb", "16000");
            props.setProperty("mapreduce.map.java.opts", "Xmx7200m");
            props.setProperty("mapreduce.reduce.memory.mb", "16000");
            props.setProperty("mapreduce.reduce.java.opts", "Xmx7200m");
            break;
        case SPARK:
            props.setProperty("hive.execution.engine", "spark");
            break;
        default:
            throw new NotImplementedException();
    }

    datasource.setConnectionProperties(props);
    return new HiveTemplate(() -> {
        return new HiveClient(datasource);
    });
}

I cannot set the execution engine for hive in a script executed via jdbc. When the same script is execute via hue web front end the script will take note that i am trying to set the execution engine to spark but not via jdbc

List<String> result = hiveTemplate.query(script);

Example of the script

set hive.execution.engine=spark;SELECT * from ......

I have tried executing an actual script in the classpath, i have also tried to send a string representing the sql script via jdbc as noted above.

I have also tried to include the following in the datasource connectionProperties with a factory class that create the hiveTemplate:

public static HiveTemplate createHiveTemplate(HiveExecutionEngine engine) {

    Properties props=new Properties();

    switch (engine) {
        case MAP_REDUCE:            props.setProperty("hive.execution.engine", "mr");            props.setProperty("mapreduce.map.memory.mb", "16000");            props.setProperty("mapreduce.map.java.opts", "Xmx7200m");            props.setProperty("mapreduce.reduce.memory.mb", "16000");            props.setProperty("mapreduce.reduce.java.opts", "Xmx7200m");
            break;
        case SPARK:            props.setProperty("hive.execution.engine", "spark");
            break;
        default:
            throw new NotImplementedException();
    }    datasource.setConnectionProperties(props);
    return new HiveTemplate(() -> {
        return new HiveClient(datasource);
    });
}

The flowing link shows the documentation to set the execution engine:https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started

 

set hive.execution.engine=spark;

I would expect the script to be executed via the spark engine in yarn and not using map reduce which is what is happening. I can confirm that the wrong engine is being applied by looking at the error message and viewing the job history via Cloudera Manager

Has anybody successfully managed to execute a hiveql script via jdbc to use the spark engine?