Support Questions
Find answers, ask questions, and share your expertise

How to read hive table1 from spark, using dataframe load the hive table1 data into table 2? Java code reference would be great

Highlighted

How to read hive table1 from spark, using dataframe load the hive table1 data into table 2? Java code reference would be great

Hi All,

I have table 1 in hive say emp1, which has columns empid int, name string, dept string, salary double. In spark, using data frame i would like to read the data from hive emp 1 table, and i need to load them into another table called emp2(assume emp2 is empty and has same DDL as that of emp1). It would be great if i get java reference code. No scala or python code needed.

Thanks in advance!

3 REPLIES 3

Re: How to read hive table1 from spark, using dataframe load the hive table1 data into table 2? Java code reference would be great

Super Guru
@Prabhu Muthaiyan Here is how you would do it. first in your spark-env.sh set HADOOP_CONF_DIR to where your hdfs-site.xml, core-site.xml and hive-site.xml exist, such that your program when it runs is able to pick up these files and know how to connect to Hive. Then you basically code similar to below
import org.apache.spark.sql.SparkSession;

SparkSession spark = SparkSession
  .builder()
  .appName("Java Spark SQL basic example")
  .config("spark.some.config.option", "some-value")
  .enableHiveSupport()
  .getOrCreate();

DataSet<Row> emp1 = spark.sql("SELECT col1, col2, col3 from emp1 where <condition goes here>");

emp1.write().saveAsTable("emp2") ; 
//or use this emp1.write().mode("append").saveAsTable("emp2") ; 

you can have write modes which are following:

SaveMode.Overwrite: overwrite the existing data. - SaveMode.Append: append the data. - SaveMode.Ignore: ignore the operation (i.e. no-op). - SaveMode.ErrorIfExists: default option, throw an exception at runtime

Highlighted

Re: How to read hive table1 from spark, using dataframe load the hive table1 data into table 2? Java code reference would be great

Super Guru
@Prabhu Muthaiyan Here is how you would do it. first in your spark-env.sh set HADOOP_CONF_DIR to where your hdfs-site.xml, core-site.xml and hive-site.xml exist, such that your program when it runs is able to pick up these files and know how to connect to Hive. Then you basically code similar to below
import org.apache.spark.sql.SparkSession;

SparkSession spark = SparkSession
  .builder()
  .appName("Java Spark SQL basic example")
  .config("spark.some.config.option", "some-value")
  .enableHiveSupport()
  .getOrCreate();

DataSet<Row> emp1 = spark.sql("SELECT col1, col2, col3 from emp1 where <condition goes here>");

emp1.write().saveAsTable("emp2") ; 
//or use this emp1.write().mode("append").saveAsTable("emp2") ; 

you can have write modes which are following:

SaveMode.Overwrite: overwrite the existing data. - SaveMode.Append: append the data. - SaveMode.Ignore: ignore the operation (i.e. no-op). - SaveMode.ErrorIfExists: default option, throw an exception at runtime

Highlighted

Re: How to read hive table1 from spark, using dataframe load the hive table1 data into table 2? Java code reference would be great

@mqureshi Thank you for the prompt response. I am new to this space. Could you please elaborate little bit on setting up spark- env.sh. I understand this is to hand shake with hive, trying to get exact values. My current set up is: HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/usr/hdp/2.6.2.0-205/hadoop/conf} How to add hdfs,hive,core site xmls.

In the java code you put above, i don't see the hive connection parameters. Do i need to replace the values on this?

.config("spark.some.config.option","some-value")

Please advise.