- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to read hive table1 from spark, using dataframe load the hive table1 data into table 2? Java code reference would be great
- Labels:
-
Apache Hive
-
Apache Spark
Created ‎09-30-2017 12:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I have table 1 in hive say emp1, which has columns empid int, name string, dept string, salary double. In spark, using data frame i would like to read the data from hive emp 1 table, and i need to load them into another table called emp2(assume emp2 is empty and has same DDL as that of emp1). It would be great if i get java reference code. No scala or python code needed.
Thanks in advance!
Created ‎09-30-2017 06:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
import org.apache.spark.sql.SparkSession; SparkSession spark = SparkSession .builder() .appName("Java Spark SQL basic example") .config("spark.some.config.option", "some-value") .enableHiveSupport() .getOrCreate(); DataSet<Row> emp1 = spark.sql("SELECT col1, col2, col3 from emp1 where <condition goes here>"); emp1.write().saveAsTable("emp2") ; //or use this emp1.write().mode("append").saveAsTable("emp2") ;
you can have write modes which are following:
SaveMode.Overwrite
: overwrite the existing data. - SaveMode.Append
: append the data. - SaveMode.Ignore
: ignore the operation (i.e. no-op). - SaveMode.ErrorIfExists
: default option, throw an exception at runtime
Created ‎09-30-2017 06:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
import org.apache.spark.sql.SparkSession; SparkSession spark = SparkSession .builder() .appName("Java Spark SQL basic example") .config("spark.some.config.option", "some-value") .enableHiveSupport() .getOrCreate(); DataSet<Row> emp1 = spark.sql("SELECT col1, col2, col3 from emp1 where <condition goes here>"); emp1.write().saveAsTable("emp2") ; //or use this emp1.write().mode("append").saveAsTable("emp2") ;
you can have write modes which are following:
SaveMode.Overwrite
: overwrite the existing data. - SaveMode.Append
: append the data. - SaveMode.Ignore
: ignore the operation (i.e. no-op). - SaveMode.ErrorIfExists
: default option, throw an exception at runtime
Created ‎10-02-2017 02:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@mqureshi Thank you for the prompt response. I am new to this space. Could you please elaborate little bit on setting up spark- env.sh. I understand this is to hand shake with hive, trying to get exact values. My current set up is: HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/usr/hdp/2.6.2.0-205/hadoop/conf} How to add hdfs,hive,core site xmls.
In the java code you put above, i don't see the hive connection parameters. Do i need to replace the values on this?
.config("spark.some.config.option","some-value")
Please advise.
