Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How to connect and run Hive query from Apache Spark in JAVA

avatar
Rising Star

In Sprig I am running Spark Application. Now I want to connect to HIVE to run HIVE query in Spring Suite itself.

How to do this?

I learned that HiveContext could be used but am clueless how to use this?

1 ACCEPTED SOLUTION

avatar

A simple Spark1 Java application to show a list of tables in Hive Metastore is as follows:

import org.apache.spark.SparkContext;
import org.apache.spark.SparkConf;
import org.apache.spark.sql.hive.HiveContext;
import org.apache.spark.sql.DataFrame;

public class SparkHiveExample {
  public static void main(String[] args) {
    SparkConf conf = new SparkConf().setAppName("SparkHive Example");
    SparkContext sc = new SparkContext(conf);
    HiveContext hiveContext = new org.apache.spark.sql.hive.HiveContext(sc);
    DataFrame df = hiveContext.sql("show tables");
    df.show();
  }
}

Note that Spark pulls metadata from Hive metastore and also uses hiveql for parsing queries but the execution of queries as such happens in the Spark execution engine.

View solution in original post

1 REPLY 1

avatar

A simple Spark1 Java application to show a list of tables in Hive Metastore is as follows:

import org.apache.spark.SparkContext;
import org.apache.spark.SparkConf;
import org.apache.spark.sql.hive.HiveContext;
import org.apache.spark.sql.DataFrame;

public class SparkHiveExample {
  public static void main(String[] args) {
    SparkConf conf = new SparkConf().setAppName("SparkHive Example");
    SparkContext sc = new SparkContext(conf);
    HiveContext hiveContext = new org.apache.spark.sql.hive.HiveContext(sc);
    DataFrame df = hiveContext.sql("show tables");
    df.show();
  }
}

Note that Spark pulls metadata from Hive metastore and also uses hiveql for parsing queries but the execution of queries as such happens in the Spark execution engine.