Reply
Highlighted
New Contributor
Posts: 1
Registered: ‎05-09-2019

How to set the execution engine to spark when accessing Cloudera Hive via JDBC

[ Edited ]

I cannot set the execution engine for hive in a script executed via jdbc. When the same script is execute via hue web front end the script will set the execution engine as expected but not if run via jdbc

 

I am trying to run hivescripts against hive with the spark engine via a java application:

 

Example of the script

 

set hive.execution.engine=spark;

SELECT * from ......

I have tried executing an actual script in the classpath, i have also tried to send a string representing the sql script via jdbc as noted above.

I have also tried to include the following in the datasource connectionProperties with a factory class that create the hiveTemplate:

 

public static HiveTemplate createHiveTemplate(HiveExecutionEngine engine) {

    Properties props=new Properties();

    switch (engine) {
        case MAP_REDUCE:
            props.setProperty("hive.execution.engine", "mr");
            props.setProperty("mapreduce.map.memory.mb", "16000");
            props.setProperty("mapreduce.map.java.opts", "Xmx7200m");
            props.setProperty("mapreduce.reduce.memory.mb", "16000");
            props.setProperty("mapreduce.reduce.java.opts", "Xmx7200m");
            break;
        case SPARK:
            props.setProperty("hive.execution.engine", "spark");
            break;
        default:
            throw new NotImplementedException();
    }

    datasource.setConnectionProperties(props);
    return new HiveTemplate(() -> {
        return new HiveClient(datasource);
    });
}

I cannot set the execution engine for hive in a script executed via jdbc. When the same script is execute via hue web front end the script will take note that i am trying to set the execution engine to spark but not via jdbc

List<String> result = hiveTemplate.query(script);

Example of the script

set hive.execution.engine=spark;SELECT * from ......

I have tried executing an actual script in the classpath, i have also tried to send a string representing the sql script via jdbc as noted above.

I have also tried to include the following in the datasource connectionProperties with a factory class that create the hiveTemplate:

public static HiveTemplate createHiveTemplate(HiveExecutionEngine engine) {

    Properties props=new Properties();

    switch (engine) {
        case MAP_REDUCE:            props.setProperty("hive.execution.engine", "mr");            props.setProperty("mapreduce.map.memory.mb", "16000");            props.setProperty("mapreduce.map.java.opts", "Xmx7200m");            props.setProperty("mapreduce.reduce.memory.mb", "16000");            props.setProperty("mapreduce.reduce.java.opts", "Xmx7200m");
            break;
        case SPARK:            props.setProperty("hive.execution.engine", "spark");
            break;
        default:
            throw new NotImplementedException();
    }    datasource.setConnectionProperties(props);
    return new HiveTemplate(() -> {
        return new HiveClient(datasource);
    });
}

The flowing link shows the documentation to set the execution engine:https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started

 

set hive.execution.engine=spark;

I would expect the script to be executed via the spark engine in yarn and not using map reduce which is what is happening. I can confirm that the wrong engine is being applied by looking at the error message and viewing the job history via Cloudera Manager

Has anybody successfully managed to execute a hiveql script via jdbc to use the spark engine?