Support Questions
Find answers, ask questions, and share your expertise

Spark-Key not found error

Spark-Key not found error

Rising Star

Need a help in a error received.

i am trying to insert into hive table from spark using following syntax. tranl1.write.mode("overwrite").partitionBy("t_date").insertInto("tran_spark_part")

Note : tranl1 is DF i had created for loading data from oracle.

val tranl1 = sqlContext.load("jdbc", Map("url" -> "jdbc:oracle:thin:userid/pwd@localhost:portid","dbtable" -> "(select a.*,to_char(to_date(trunc(txn_date,'DD'),'dd-MM-yy')) as t_date from table a WHERE TXN_DATE >= TRUNC(sysdate,'DD'))"))

my table in hive :

create table tran_spark_part(id string,amount string,creditaccount string,creditbankname string,creditvpa string,customerid string,debitaccount string,debitbankname string,debitvpa string,irc string,refid string,remarks string,rrn string,status string,txn_date string,txnid string,type string,expiry_date string,approvalnum string,msgid string,seqno string,upirc string,reversal string,trantype string) partitioned by (date1 string);

However when i run

tranl1.write.mode("overwrite").partitionBy("t_date").insertInto("tran_spark_part") it gives error : java.util.NoSuchElementException: key not found: date1

Please help me what i am missing or doing wrong?

thanks. Rishit Shah.

7 REPLIES 7

Re: Spark-Key not found error

partitionBy uses column names.

Hive table has "date1" and in Spark "t_date" is used.

Have you tried to rename the dataframe column to date1 so that it matches the Hive schema?

Re: Spark-Key not found error

Rising Star

yes tried with same column name as well..but same error.

thanks,

Rishit Shah

Re: Spark-Key not found error

Contributor

From the code you posted, it seems that you referred to date1 in the 'partitioned by' clause of your 'create table tran_spark_part' statement. You do not have date1 column in the tran_spark_part table.

Re: Spark-Key not found error

Rising Star

hi @pbarna

thanks for the response.

date1 is dynamic partition column hence i have not declared it in table and i am trying to use the t_date column in df as date1 value.

correct me if i am doing wrong?

thanks,

Rishit Shah

Re: Spark-Key not found error

Hive:

create table tran_spark_part (
  id String,
  amount BigInt
)
partitioned by (date1 string);

This works in Spark (tested on 1.6.2):

> case class Person(id: String, amount: Integer, date1: String)
> val df = Seq(Person("1", 32, "2017")).toDF()

> sqlContext.setConf("hive.exec.dynamic.partition", "true") 
> sqlContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")

> df.write.mode("overwrite").partitionBy("date1").insertInto("tran_spark_part")

> sqlContext.sql("select * from tran_spark_part").show()

+---+------+-----+
| id|amount|date1|
+---+------+-----+
|  1|    32| 2017|
+---+------+-----+

This doesn't:

> case class Person2(id: String, amount: Integer, t_date: String)
> val df2 = Seq(Person("2", 42, "2017")).toDF()
> df2.write.mode("overwrite").partitionBy("t_date").insertInto("tran_part")

org.apache.spark.sql.AnalysisException: Partition column t_date not found in existing columns (id, amount, date1);

Re: Spark-Key not found error

Rising Star

Hi @Bernhard Walter

thanks for the reply. As i mentioned in other post above i even tried with date1 but got same error as key not found.

is it possible that loading from oracle directly is creating any prob?

thanks,

Rishit Shah

Re: Spark-Key not found error

Hard to say from the info you gave.

Since you load the data into a DataFrame, the Oracle part should be abstracted, as long as the schema fits - and I guess you checked the schema of "tran1"

You could try to select from tran1 into another dataframe trans2 to control all columns, check schema and try to write tran2