Created 09-30-2017 12:46 AM
Hi All,
I have table 1 in hive say emp1, which has columns empid int, name string, dept string, salary double. In spark, using data frame i would like to read the data from hive emp 1 table, and i need to load them into another table called emp2(assume emp2 is empty and has same DDL as that of emp1). It would be great if i get java reference code. No scala or python code needed.
Thanks in advance!
Created 09-30-2017 06:06 AM
import org.apache.spark.sql.SparkSession; SparkSession spark = SparkSession .builder() .appName("Java Spark SQL basic example") .config("spark.some.config.option", "some-value") .enableHiveSupport() .getOrCreate(); DataSet<Row> emp1 = spark.sql("SELECT col1, col2, col3 from emp1 where <condition goes here>"); emp1.write().saveAsTable("emp2") ; //or use this emp1.write().mode("append").saveAsTable("emp2") ;
you can have write modes which are following:
SaveMode.Overwrite
: overwrite the existing data. - SaveMode.Append
: append the data. - SaveMode.Ignore
: ignore the operation (i.e. no-op). - SaveMode.ErrorIfExists
: default option, throw an exception at runtime
Created 09-30-2017 06:06 AM
import org.apache.spark.sql.SparkSession; SparkSession spark = SparkSession .builder() .appName("Java Spark SQL basic example") .config("spark.some.config.option", "some-value") .enableHiveSupport() .getOrCreate(); DataSet<Row> emp1 = spark.sql("SELECT col1, col2, col3 from emp1 where <condition goes here>"); emp1.write().saveAsTable("emp2") ; //or use this emp1.write().mode("append").saveAsTable("emp2") ;
you can have write modes which are following:
SaveMode.Overwrite
: overwrite the existing data. - SaveMode.Append
: append the data. - SaveMode.Ignore
: ignore the operation (i.e. no-op). - SaveMode.ErrorIfExists
: default option, throw an exception at runtime
Created 10-02-2017 02:59 PM
@mqureshi Thank you for the prompt response. I am new to this space. Could you please elaborate little bit on setting up spark- env.sh. I understand this is to hand shake with hive, trying to get exact values. My current set up is: HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/usr/hdp/2.6.2.0-205/hadoop/conf} How to add hdfs,hive,core site xmls.
In the java code you put above, i don't see the hive connection parameters. Do i need to replace the values on this?
.config("spark.some.config.option","some-value")
Please advise.