Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive Dynamic partition issue

avatar
Explorer

I have 2 tables as below.

CREATE EXTERNAL TABLE IF NOT EXISTS TEMP_tab(id int,mytime STRING,age int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 'hdfs://xxx';

CREATE TABLE IF NOT EXISTS main_TAB(id int,age int)
PARTITIONED BY (mytime STRING)
STORED AS ORC
tblproperties ("orc.compress"="ZLIB");

FROM TEMP_TAB INSERT OVERWRITE TABLE main_TAB 
PARTITION (mytime) 
SELECT *,substr(mytime,0,10) as mytime;

but the strange thing is Insert does not work. It has following error message

Error: org.apache.spark.sql.AnalysisException: Cannot insert into table m16.main_TAB because the number of columns are different: need 2 columns, but query has 3 columns.; (state=,code=0)

I have already set these 2 as well

SET hive.exec.dynamic.partition = true 

SET hive.exec.dynamic.partition.mode = nonstrict

2 REPLIES 2

avatar

@Team Spark

Seems the target has 2 columns where as your insert query has 3 columns. Check the no of columns in the select clause. Then it should work fine. Its not the problem with dynamic partition.

avatar
Master Guru
@Team Spark


Your TEMP_tab table having 3 columns and your insert query having 4 columns(* means 3 columns from temp_tab and substr(mytime,0,10) means extra 1 column)

use the below query will work for your case

FROM TEMP_TAB INSERT OVERWRITE TABLE main_TAB 
PARTITION (mytime) 
SELECT id,age,substr(mytime,0,10) as mytime;

* *in addition in the above insert statement you are going to miss mytime column value as you are doing sub string that means source data is going to miss from temp_tab table to main_tab table.

Ex:-

temp_tab having 2017-10-12 12:20:23 but main_tab will have 2017-10-12, here we are going to miss 12:20:23 time from temp_tab to main _tab.

In case if you dont want to miss the data then

create main tab table with 4 columns in with dt as partition column

CREATE TABLE IF NOT EXISTS main_TAB(id int,mytime STRING,age int)
PARTITIONED BY (dt string)
STORED AS ORC
tblproperties ("orc.compress"="ZLIB");

then do insert statement as below

FROM TEMP_TAB INSERT OVERWRITE TABLE main_TAB PARTITION (mytime) SELECT *,substr(mytime,0,10) as mytime;

in this case partition column would be dt and you are not missing temp_tab data at all.