- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hive Script Unable to Insert int target
- Labels:
-
Apache Hive
Created 01-28-2018 08:34 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Community,
Can some let me know why I'm getting the error following error:
FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target table because column number/types are different 'HiveSampleOut': Table insclause-0 has 1 columns, but query has 2 columns. 18/01/28 20:21:39 [main]: ERROR ql.Driver: FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target table because column number/types are different 'HiveSampleOut': Table insclause-0 has 1 columns, but query has 2 columns.
I have the following Hive script
DROP TABLE IF EXISTS HiveSampleIn; CREATE EXTERNAL TABLE HiveSampleIn ( anonid int, eprofileclass int, fueltypes STRING, acorn_category int, acorn_group STRING, acorn_type int, nuts4 STRING, lacode STRING, nuts1 STRING, gspgroup STRING, ldz STRING, gas_elec STRING, gas_tout STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/samplein/'; DROP TABLE IF EXISTS HiveSampleOut; CREATE EXTERNAL TABLE HiveSampleOut ( acorn_category int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/'; INSERT OVERWRITE TABLE HiveSampleOut Select acorn_category, count(*) as acorn_categorycount FROM HiveSampleIn Group by acorn_category
Any help will be greatly appreciated.
Cheers
Carlton
Created 02-01-2018 06:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is an extra semicolon before TBLPROPERTIES , removing it will solve your problem. Use below script:
DROP TABLE IF EXISTS HiveSampleIn; CREATE EXTERNAL TABLE HiveSampleIn ( anonid int, eprofileclass int, fueltypes STRING, acorn_category int, acorn_group STRING, acorn_type int, nuts4 STRING, lacode STRING, nuts1 STRING, gspgroup STRING, ldz STRING, gas_elec STRING, gas_tout STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/samplein/' TBLPROPERTIES ("skip.header.line.count" = "1");
DROP TABLE IF EXISTS HiveSampleOut; CREATE EXTERNAL TABLE HiveSampleOut ( acorn_category int, acorn_categorycount int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/';
INSERT OVERWRITE TABLE HiveSampleOut Select acorn_category, count(*) as acorn_categorycount FROM HiveSampleIn Group by acorn_category;
Created 01-30-2018 06:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HiveSampleOut has only 1 field acorn_category and you are trying to insert 2 fields into this table acorn_category and acorn_categorycount. Hence, the error.
Define the table with 2 fields .
Created 01-30-2018 11:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi rtrivedi
Thanks for getting in touch. Im very new to Hive, tbh, I took the script from the following link
https://bigdata.ukdataservice.ac.uk/media/604332/hiveql-queries.pdf#page5
In the link, the author suggests using the following line of code to fix the problem
TBLPROPERTIES ("skip.header.line.count" = "1");
But I can't get the code to work with the Hive script.
Therefore, if you could show me how implement the code or let me know how to define a table with 2 fields as you suggested that would be very much appreciated.
Cheers
Carlton
Created 01-31-2018 08:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Create your table as :
CREATE EXTERNAL TABLE HiveSampleOut (
acorn_category int,
acorn_categorycount int )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/';
Or directly create table on the fly while inserting data like:
Create table HiveSampleOut As Select acorn_category, count(*)as acorn_categorycount FROM HiveSampleInGroupby acorn_category ;
Created 01-31-2018 11:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
rtrivedi
That worked perfectly thanks so much..
Created 01-31-2018 11:33 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
rtrivedi
I have modified the script to add the following line of code:
TBLPROPERTIES ("skip.header.line.count" = "1");
The script now looks like the following:
DROP TABLE IF EXISTS HiveSampleIn; CREATE EXTERNAL TABLE HiveSampleIn ( anonid int, eprofileclass int, fueltypes STRING, acorn_category int, acorn_group STRING, acorn_type int, nuts4 STRING, lacode STRING, nuts1 STRING, gspgroup STRING, ldz STRING, gas_elec STRING, gas_tout STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/samplein/'; TBLPROPERTIES ("skip.header.line.count" = "1"); DROP TABLE IF EXISTS HiveSampleOut; CREATE EXTERNAL TABLE HiveSampleOut ( acorn_category int, acorn_categorycount int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/'; INSERT OVERWRITE TABLE HiveSampleOut Select acorn_category, count(*) as acorn_categorycount FROM HiveSampleIn Group by acorn_category;
However, I keep on getting the following error:
FAILED: ParseException line 2:0 cannot recognize input near 'TBLPROPERTIES' '(' '"skip.header.line.count"' 18/01/31 11:28:20 [main]: ERROR ql.Driver: FAILED: ParseException line 2:0 cannot recognize input near 'TBLPROPERTIES' '(' '"skip.header.line.count"' org.apache.hadoop.hive.ql.parse.ParseException: line 2:0 cannot recognize input near 'TBLPROPERTIES' '(' '"skip.header.line.count"'
Your help with this last error will complete all that I need to know to fix this issue.
Much appreciated
Created 02-01-2018 06:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is an extra semicolon before TBLPROPERTIES , removing it will solve your problem. Use below script:
DROP TABLE IF EXISTS HiveSampleIn; CREATE EXTERNAL TABLE HiveSampleIn ( anonid int, eprofileclass int, fueltypes STRING, acorn_category int, acorn_group STRING, acorn_type int, nuts4 STRING, lacode STRING, nuts1 STRING, gspgroup STRING, ldz STRING, gas_elec STRING, gas_tout STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/samplein/' TBLPROPERTIES ("skip.header.line.count" = "1");
DROP TABLE IF EXISTS HiveSampleOut; CREATE EXTERNAL TABLE HiveSampleOut ( acorn_category int, acorn_categorycount int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/';
INSERT OVERWRITE TABLE HiveSampleOut Select acorn_category, count(*) as acorn_categorycount FROM HiveSampleIn Group by acorn_category;
Created 02-01-2018 10:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi rtrivedi,
I'm about to test this out.
I will let you know how I get on.
Cheers
Created 02-01-2018 03:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi rtrivedi,
The fix you provided worked perfectly. Would it be possible to include the column headings in the output? At the moment, the output looks like the following:
\N,1 0,9 1,5159 2,694 3,4540 4,1975 5,2207 6,36
Created 02-01-2018 04:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi rtrivedi,
I added the additional code as suggested, but I get the following error:
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ParseException line 2:0 cannot recognize input near 'set' 'hive' '.' in statement
