Created 01-28-2018 08:34 PM
Hello Community,
Can some let me know why I'm getting the error following error:
FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target table because column number/types are different 'HiveSampleOut': Table insclause-0 has 1 columns, but query has 2 columns. 18/01/28 20:21:39 [main]: ERROR ql.Driver: FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target table because column number/types are different 'HiveSampleOut': Table insclause-0 has 1 columns, but query has 2 columns.
I have the following Hive script
DROP TABLE IF EXISTS HiveSampleIn; CREATE EXTERNAL TABLE HiveSampleIn ( anonid int, eprofileclass int, fueltypes STRING, acorn_category int, acorn_group STRING, acorn_type int, nuts4 STRING, lacode STRING, nuts1 STRING, gspgroup STRING, ldz STRING, gas_elec STRING, gas_tout STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/samplein/'; DROP TABLE IF EXISTS HiveSampleOut; CREATE EXTERNAL TABLE HiveSampleOut ( acorn_category int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/'; INSERT OVERWRITE TABLE HiveSampleOut Select acorn_category, count(*) as acorn_categorycount FROM HiveSampleIn Group by acorn_category
Any help will be greatly appreciated.
Cheers
Carlton
Created 02-01-2018 06:15 AM
There is an extra semicolon before TBLPROPERTIES , removing it will solve your problem. Use below script:
DROP TABLE IF EXISTS HiveSampleIn; CREATE EXTERNAL TABLE HiveSampleIn ( anonid int, eprofileclass int, fueltypes STRING, acorn_category int, acorn_group STRING, acorn_type int, nuts4 STRING, lacode STRING, nuts1 STRING, gspgroup STRING, ldz STRING, gas_elec STRING, gas_tout STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/samplein/' TBLPROPERTIES ("skip.header.line.count" = "1");
DROP TABLE IF EXISTS HiveSampleOut; CREATE EXTERNAL TABLE HiveSampleOut ( acorn_category int, acorn_categorycount int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/';
INSERT OVERWRITE TABLE HiveSampleOut Select acorn_category, count(*) as acorn_categorycount FROM HiveSampleIn Group by acorn_category;
Created 01-30-2018 06:53 AM
HiveSampleOut has only 1 field acorn_category and you are trying to insert 2 fields into this table acorn_category and acorn_categorycount. Hence, the error.
Define the table with 2 fields .
Created 01-30-2018 11:06 AM
Hi rtrivedi
Thanks for getting in touch. Im very new to Hive, tbh, I took the script from the following link
https://bigdata.ukdataservice.ac.uk/media/604332/hiveql-queries.pdf#page5
In the link, the author suggests using the following line of code to fix the problem
TBLPROPERTIES ("skip.header.line.count" = "1");
But I can't get the code to work with the Hive script.
Therefore, if you could show me how implement the code or let me know how to define a table with 2 fields as you suggested that would be very much appreciated.
Cheers
Carlton
Created 01-31-2018 08:09 AM
Create your table as :
CREATE EXTERNAL TABLE HiveSampleOut (
acorn_category int,
acorn_categorycount int )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/';
Or directly create table on the fly while inserting data like:
Create table HiveSampleOut As Select acorn_category, count(*)as acorn_categorycount FROM HiveSampleInGroupby acorn_category ;
Created 01-31-2018 11:23 AM
rtrivedi
That worked perfectly thanks so much..
Created 01-31-2018 11:33 AM
rtrivedi
I have modified the script to add the following line of code:
TBLPROPERTIES ("skip.header.line.count" = "1");
The script now looks like the following:
DROP TABLE IF EXISTS HiveSampleIn; CREATE EXTERNAL TABLE HiveSampleIn ( anonid int, eprofileclass int, fueltypes STRING, acorn_category int, acorn_group STRING, acorn_type int, nuts4 STRING, lacode STRING, nuts1 STRING, gspgroup STRING, ldz STRING, gas_elec STRING, gas_tout STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/samplein/'; TBLPROPERTIES ("skip.header.line.count" = "1"); DROP TABLE IF EXISTS HiveSampleOut; CREATE EXTERNAL TABLE HiveSampleOut ( acorn_category int, acorn_categorycount int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/'; INSERT OVERWRITE TABLE HiveSampleOut Select acorn_category, count(*) as acorn_categorycount FROM HiveSampleIn Group by acorn_category;
However, I keep on getting the following error:
FAILED: ParseException line 2:0 cannot recognize input near 'TBLPROPERTIES' '(' '"skip.header.line.count"' 18/01/31 11:28:20 [main]: ERROR ql.Driver: FAILED: ParseException line 2:0 cannot recognize input near 'TBLPROPERTIES' '(' '"skip.header.line.count"' org.apache.hadoop.hive.ql.parse.ParseException: line 2:0 cannot recognize input near 'TBLPROPERTIES' '(' '"skip.header.line.count"'
Your help with this last error will complete all that I need to know to fix this issue.
Much appreciated
Created 02-01-2018 06:15 AM
There is an extra semicolon before TBLPROPERTIES , removing it will solve your problem. Use below script:
DROP TABLE IF EXISTS HiveSampleIn; CREATE EXTERNAL TABLE HiveSampleIn ( anonid int, eprofileclass int, fueltypes STRING, acorn_category int, acorn_group STRING, acorn_type int, nuts4 STRING, lacode STRING, nuts1 STRING, gspgroup STRING, ldz STRING, gas_elec STRING, gas_tout STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/samplein/' TBLPROPERTIES ("skip.header.line.count" = "1");
DROP TABLE IF EXISTS HiveSampleOut; CREATE EXTERNAL TABLE HiveSampleOut ( acorn_category int, acorn_categorycount int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/';
INSERT OVERWRITE TABLE HiveSampleOut Select acorn_category, count(*) as acorn_categorycount FROM HiveSampleIn Group by acorn_category;
Created 02-01-2018 10:59 AM
Hi rtrivedi,
I'm about to test this out.
I will let you know how I get on.
Cheers
Created 02-01-2018 03:03 PM
Hi rtrivedi,
The fix you provided worked perfectly. Would it be possible to include the column headings in the output? At the moment, the output looks like the following:
\N,1 0,9 1,5159 2,694 3,4540 4,1975 5,2207 6,36
Created 02-01-2018 04:25 PM
Hi rtrivedi,
I added the additional code as suggested, but I get the following error:
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ParseException line 2:0 cannot recognize input near 'set' 'hive' '.' in statement