Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to read hive table1 from spark, using dataframe load the hive table1 data into table 2? Java code reference would be great

avatar

Hi All,

I have table 1 in hive say emp1, which has columns empid int, name string, dept string, salary double. In spark, using data frame i would like to read the data from hive emp 1 table, and i need to load them into another table called emp2(assume emp2 is empty and has same DDL as that of emp1). It would be great if i get java reference code. No scala or python code needed.

Thanks in advance!

3 REPLIES 3

avatar
Super Guru
@Prabhu Muthaiyan Here is how you would do it. first in your spark-env.sh set HADOOP_CONF_DIR to where your hdfs-site.xml, core-site.xml and hive-site.xml exist, such that your program when it runs is able to pick up these files and know how to connect to Hive. Then you basically code similar to below
import org.apache.spark.sql.SparkSession;

SparkSession spark = SparkSession
  .builder()
  .appName("Java Spark SQL basic example")
  .config("spark.some.config.option", "some-value")
  .enableHiveSupport()
  .getOrCreate();

DataSet<Row> emp1 = spark.sql("SELECT col1, col2, col3 from emp1 where <condition goes here>");

emp1.write().saveAsTable("emp2") ; 
//or use this emp1.write().mode("append").saveAsTable("emp2") ; 

you can have write modes which are following:

SaveMode.Overwrite: overwrite the existing data. - SaveMode.Append: append the data. - SaveMode.Ignore: ignore the operation (i.e. no-op). - SaveMode.ErrorIfExists: default option, throw an exception at runtime

avatar
Super Guru
@Prabhu Muthaiyan Here is how you would do it. first in your spark-env.sh set HADOOP_CONF_DIR to where your hdfs-site.xml, core-site.xml and hive-site.xml exist, such that your program when it runs is able to pick up these files and know how to connect to Hive. Then you basically code similar to below
import org.apache.spark.sql.SparkSession;

SparkSession spark = SparkSession
  .builder()
  .appName("Java Spark SQL basic example")
  .config("spark.some.config.option", "some-value")
  .enableHiveSupport()
  .getOrCreate();

DataSet<Row> emp1 = spark.sql("SELECT col1, col2, col3 from emp1 where <condition goes here>");

emp1.write().saveAsTable("emp2") ; 
//or use this emp1.write().mode("append").saveAsTable("emp2") ; 

you can have write modes which are following:

SaveMode.Overwrite: overwrite the existing data. - SaveMode.Append: append the data. - SaveMode.Ignore: ignore the operation (i.e. no-op). - SaveMode.ErrorIfExists: default option, throw an exception at runtime

avatar

@mqureshi Thank you for the prompt response. I am new to this space. Could you please elaborate little bit on setting up spark- env.sh. I understand this is to hand shake with hive, trying to get exact values. My current set up is: HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/usr/hdp/2.6.2.0-205/hadoop/conf} How to add hdfs,hive,core site xmls.

In the java code you put above, i don't see the hive connection parameters. Do i need to replace the values on this?

.config("spark.some.config.option","some-value")

Please advise.