Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark-Key not found error

avatar
Expert Contributor

Need a help in a error received.

i am trying to insert into hive table from spark using following syntax. tranl1.write.mode("overwrite").partitionBy("t_date").insertInto("tran_spark_part")

Note : tranl1 is DF i had created for loading data from oracle.

val tranl1 = sqlContext.load("jdbc", Map("url" -> "jdbc:oracle:thin:userid/pwd@localhost:portid","dbtable" -> "(select a.*,to_char(to_date(trunc(txn_date,'DD'),'dd-MM-yy')) as t_date from table a WHERE TXN_DATE >= TRUNC(sysdate,'DD'))"))

my table in hive :

create table tran_spark_part(id string,amount string,creditaccount string,creditbankname string,creditvpa string,customerid string,debitaccount string,debitbankname string,debitvpa string,irc string,refid string,remarks string,rrn string,status string,txn_date string,txnid string,type string,expiry_date string,approvalnum string,msgid string,seqno string,upirc string,reversal string,trantype string) partitioned by (date1 string);

However when i run

tranl1.write.mode("overwrite").partitionBy("t_date").insertInto("tran_spark_part") it gives error : java.util.NoSuchElementException: key not found: date1

Please help me what i am missing or doing wrong?

thanks. Rishit Shah.

7 REPLIES 7

avatar

partitionBy uses column names.

Hive table has "date1" and in Spark "t_date" is used.

Have you tried to rename the dataframe column to date1 so that it matches the Hive schema?

avatar
Expert Contributor

yes tried with same column name as well..but same error.

thanks,

Rishit Shah

avatar
Rising Star

From the code you posted, it seems that you referred to date1 in the 'partitioned by' clause of your 'create table tran_spark_part' statement. You do not have date1 column in the tran_spark_part table.

avatar
Expert Contributor

hi @pbarna

thanks for the response.

date1 is dynamic partition column hence i have not declared it in table and i am trying to use the t_date column in df as date1 value.

correct me if i am doing wrong?

thanks,

Rishit Shah

avatar

Hive:

create table tran_spark_part (
  id String,
  amount BigInt
)
partitioned by (date1 string);

This works in Spark (tested on 1.6.2):

> case class Person(id: String, amount: Integer, date1: String)
> val df = Seq(Person("1", 32, "2017")).toDF()

> sqlContext.setConf("hive.exec.dynamic.partition", "true") 
> sqlContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")

> df.write.mode("overwrite").partitionBy("date1").insertInto("tran_spark_part")

> sqlContext.sql("select * from tran_spark_part").show()

+---+------+-----+
| id|amount|date1|
+---+------+-----+
|  1|    32| 2017|
+---+------+-----+

This doesn't:

> case class Person2(id: String, amount: Integer, t_date: String)
> val df2 = Seq(Person("2", 42, "2017")).toDF()
> df2.write.mode("overwrite").partitionBy("t_date").insertInto("tran_part")

org.apache.spark.sql.AnalysisException: Partition column t_date not found in existing columns (id, amount, date1);

avatar
Expert Contributor

Hi @Bernhard Walter

thanks for the reply. As i mentioned in other post above i even tried with date1 but got same error as key not found.

is it possible that loading from oracle directly is creating any prob?

thanks,

Rishit Shah

avatar

Hard to say from the info you gave.

Since you load the data into a DataFrame, the Oracle part should be abstracted, as long as the schema fits - and I guess you checked the schema of "tran1"

You could try to select from tran1 into another dataframe trans2 to control all columns, check schema and try to write tran2