Created 03-21-2017 05:10 AM
Need a help in a error received.
i am trying to insert into hive table from spark using following syntax. tranl1.write.mode("overwrite").partitionBy("t_date").insertInto("tran_spark_part")
Note : tranl1 is DF i had created for loading data from oracle.
val tranl1 = sqlContext.load("jdbc", Map("url" -> "jdbc:oracle:thin:userid/pwd@localhost:portid","dbtable" -> "(select a.*,to_char(to_date(trunc(txn_date,'DD'),'dd-MM-yy')) as t_date from table a WHERE TXN_DATE >= TRUNC(sysdate,'DD'))"))
my table in hive :
create table tran_spark_part(id string,amount string,creditaccount string,creditbankname string,creditvpa string,customerid string,debitaccount string,debitbankname string,debitvpa string,irc string,refid string,remarks string,rrn string,status string,txn_date string,txnid string,type string,expiry_date string,approvalnum string,msgid string,seqno string,upirc string,reversal string,trantype string) partitioned by (date1 string);
However when i run
tranl1.write.mode("overwrite").partitionBy("t_date").insertInto("tran_spark_part") it gives error : java.util.NoSuchElementException: key not found: date1
Please help me what i am missing or doing wrong?
thanks. Rishit Shah.
Created 03-21-2017 11:13 AM
partitionBy uses column names.
Hive table has "date1" and in Spark "t_date" is used.
Have you tried to rename the dataframe column to date1 so that it matches the Hive schema?
Created 03-21-2017 12:13 PM
yes tried with same column name as well..but same error.
thanks,
Rishit Shah
Created 03-21-2017 01:03 PM
From the code you posted, it seems that you referred to date1 in the 'partitioned by' clause of your 'create table tran_spark_part' statement. You do not have date1 column in the tran_spark_part table.
Created 03-22-2017 04:30 AM
hi @pbarna
thanks for the response.
date1 is dynamic partition column hence i have not declared it in table and i am trying to use the t_date column in df as date1 value.
correct me if i am doing wrong?
thanks,
Rishit Shah
Created 03-22-2017 04:33 PM
Hive:
create table tran_spark_part ( id String, amount BigInt ) partitioned by (date1 string);
This works in Spark (tested on 1.6.2):
> case class Person(id: String, amount: Integer, date1: String) > val df = Seq(Person("1", 32, "2017")).toDF() > sqlContext.setConf("hive.exec.dynamic.partition", "true") > sqlContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict") > df.write.mode("overwrite").partitionBy("date1").insertInto("tran_spark_part") > sqlContext.sql("select * from tran_spark_part").show() +---+------+-----+ | id|amount|date1| +---+------+-----+ | 1| 32| 2017| +---+------+-----+
This doesn't:
> case class Person2(id: String, amount: Integer, t_date: String) > val df2 = Seq(Person("2", 42, "2017")).toDF() > df2.write.mode("overwrite").partitionBy("t_date").insertInto("tran_part") org.apache.spark.sql.AnalysisException: Partition column t_date not found in existing columns (id, amount, date1);
Created 03-24-2017 06:50 AM
thanks for the reply. As i mentioned in other post above i even tried with date1 but got same error as key not found.
is it possible that loading from oracle directly is creating any prob?
thanks,
Rishit Shah
Created 03-24-2017 08:11 AM
Hard to say from the info you gave.
Since you load the data into a DataFrame, the Oracle part should be abstracted, as long as the schema fits - and I guess you checked the schema of "tran1"
You could try to select from tran1 into another dataframe trans2 to control all columns, check schema and try to write tran2