Created 11-06-2017 07:03 AM
I have 2 tables as below.
CREATE EXTERNAL TABLE IF NOT EXISTS TEMP_tab(id int,mytime STRING,age int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 'hdfs://xxx';
CREATE TABLE IF NOT EXISTS main_TAB(id int,age int)
PARTITIONED BY (mytime STRING)
STORED AS ORC
tblproperties ("orc.compress"="ZLIB");
FROM TEMP_TAB INSERT OVERWRITE TABLE main_TAB
PARTITION (mytime)
SELECT *,substr(mytime,0,10) as mytime;
but the strange thing is Insert does not work. It has following error message
Error: org.apache.spark.sql.AnalysisException: Cannot insert into table
m16
.main_TAB
because the number of columns are different: need 2 columns, but query has 3 columns.; (state=,code=0)
I have already set these 2 as well
SET hive.exec.dynamic.partition = true
SET hive.exec.dynamic.partition.mode = nonstrict
Created 11-06-2017 07:57 AM
Seems the target has 2 columns where as your insert query has 3 columns. Check the no of columns in the select clause. Then it should work fine. Its not the problem with dynamic partition.
Created 11-06-2017 01:35 PM
Your TEMP_tab table having 3 columns and your insert query having 4 columns(* means 3 columns from temp_tab and substr(mytime,0,10) means extra 1 column)
use the below query will work for your case
FROM TEMP_TAB INSERT OVERWRITE TABLE main_TAB PARTITION (mytime) SELECT id,age,substr(mytime,0,10) as mytime;
* *in addition in the above insert statement you are going to miss mytime column value as you are doing sub string that means source data is going to miss from temp_tab table to main_tab table.
Ex:-
temp_tab having 2017-10-12 12:20:23 but main_tab will have 2017-10-12, here we are going to miss 12:20:23 time from temp_tab to main _tab.
In case if you dont want to miss the data then
create main tab table with 4 columns in with dt as partition column
CREATE TABLE IF NOT EXISTS main_TAB(id int,mytime STRING,age int) PARTITIONED BY (dt string) STORED AS ORC tblproperties ("orc.compress"="ZLIB");
then do insert statement as below
FROM TEMP_TAB INSERT OVERWRITE TABLE main_TAB PARTITION (mytime) SELECT *,substr(mytime,0,10) as mytime;
in this case partition column would be dt and you are not missing temp_tab data at all.