Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive Script Unable to Insert int target

avatar
Explorer

Hello Community,

Can some let me know why I'm getting the error following error:

FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target table because column number/types are different 'HiveSampleOut': Table insclause-0 has 1 columns, but query has 2 columns.
18/01/28 20:21:39 [main]: ERROR ql.Driver: FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target table because column number/types are different 'HiveSampleOut': Table insclause-0 has 1 columns, but query has 2 columns.

I have the following Hive script

DROP TABLE IF EXISTS HiveSampleIn; 
CREATE EXTERNAL TABLE HiveSampleIn 
(
 anonid int,
 eprofileclass int,
 fueltypes STRING,
 acorn_category int,
 acorn_group STRING,
 acorn_type int,
 nuts4 STRING,
 lacode STRING,
 nuts1 STRING,
 gspgroup STRING,
 ldz STRING,
 gas_elec STRING,
 gas_tout STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/samplein/'; 
DROP TABLE IF EXISTS HiveSampleOut; 
CREATE EXTERNAL TABLE HiveSampleOut 
(   
    acorn_category int
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/';
INSERT OVERWRITE TABLE HiveSampleOut
Select 
   acorn_category,
   count(*) as acorn_categorycount 
FROM HiveSampleIn Group by acorn_category

Any help will be greatly appreciated.

Cheers

Carlton

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@Carlton Patterson

There is an extra semicolon before TBLPROPERTIES , removing it will solve your problem. Use below script:

DROP TABLE IF EXISTS HiveSampleIn; CREATE EXTERNAL TABLE HiveSampleIn ( anonid int, eprofileclass int, fueltypes STRING, acorn_category int, acorn_group STRING, acorn_type int, nuts4 STRING, lacode STRING, nuts1 STRING, gspgroup STRING, ldz STRING, gas_elec STRING, gas_tout STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/samplein/' TBLPROPERTIES ("skip.header.line.count" = "1");

DROP TABLE IF EXISTS HiveSampleOut; CREATE EXTERNAL TABLE HiveSampleOut ( acorn_category int, acorn_categorycount int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/';

INSERT OVERWRITE TABLE HiveSampleOut Select acorn_category, count(*) as acorn_categorycount FROM HiveSampleIn Group by acorn_category;

View solution in original post

14 REPLIES 14

avatar
Expert Contributor

@Carlton Patterson

HiveSampleOut has only 1 field acorn_category and you are trying to insert 2 fields into this table acorn_category and acorn_categorycount. Hence, the error.

Define the table with 2 fields .

avatar
Explorer

Hi rtrivedi

Thanks for getting in touch. Im very new to Hive, tbh, I took the script from the following link

https://bigdata.ukdataservice.ac.uk/media/604332/hiveql-queries.pdf#page5

In the link, the author suggests using the following line of code to fix the problem

TBLPROPERTIES ("skip.header.line.count" = "1");

But I can't get the code to work with the Hive script.

Therefore, if you could show me how implement the code or let me know how to define a table with 2 fields as you suggested that would be very much appreciated.

Cheers

Carlton

avatar
Expert Contributor

Create your table as :

CREATE EXTERNAL TABLE HiveSampleOut (

acorn_category int,

acorn_categorycount int )

ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/';

Or directly create table on the fly while inserting data like:

Create table HiveSampleOut As Select acorn_category, count(*)as acorn_categorycount FROM HiveSampleInGroupby acorn_category ;

avatar
Explorer

rtrivedi

That worked perfectly thanks so much..

avatar
Explorer

rtrivedi

I have modified the script to add the following line of code:

TBLPROPERTIES ("skip.header.line.count" = "1");

The script now looks like the following:

DROP TABLE IF EXISTS HiveSampleIn; 
CREATE EXTERNAL TABLE HiveSampleIn 
(
 anonid int,
 eprofileclass int,
 fueltypes STRING,
 acorn_category int,
 acorn_group STRING,
 acorn_type int,
 nuts4 STRING,
 lacode STRING,
 nuts1 STRING,
 gspgroup STRING,
 ldz STRING,
 gas_elec STRING,
 gas_tout STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/samplein/'; 
TBLPROPERTIES ("skip.header.line.count" = "1");
DROP TABLE IF EXISTS HiveSampleOut; 
CREATE EXTERNAL TABLE HiveSampleOut 
(
acorn_category int,
acorn_categorycount int )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/';
INSERT OVERWRITE TABLE HiveSampleOut
Select 
   acorn_category,
   count(*) as acorn_categorycount 
FROM HiveSampleIn Group by acorn_category;

However, I keep on getting the following error:

FAILED: ParseException line 2:0 cannot recognize input near 'TBLPROPERTIES' '(' '"skip.header.line.count"'
18/01/31 11:28:20 [main]: ERROR ql.Driver: FAILED: ParseException line 2:0 cannot recognize input near 'TBLPROPERTIES' '(' '"skip.header.line.count"'
org.apache.hadoop.hive.ql.parse.ParseException: line 2:0 cannot recognize input near 'TBLPROPERTIES' '(' '"skip.header.line.count"'

Your help with this last error will complete all that I need to know to fix this issue.

Much appreciated

avatar
Expert Contributor

@Carlton Patterson

There is an extra semicolon before TBLPROPERTIES , removing it will solve your problem. Use below script:

DROP TABLE IF EXISTS HiveSampleIn; CREATE EXTERNAL TABLE HiveSampleIn ( anonid int, eprofileclass int, fueltypes STRING, acorn_category int, acorn_group STRING, acorn_type int, nuts4 STRING, lacode STRING, nuts1 STRING, gspgroup STRING, ldz STRING, gas_elec STRING, gas_tout STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/samplein/' TBLPROPERTIES ("skip.header.line.count" = "1");

DROP TABLE IF EXISTS HiveSampleOut; CREATE EXTERNAL TABLE HiveSampleOut ( acorn_category int, acorn_categorycount int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/';

INSERT OVERWRITE TABLE HiveSampleOut Select acorn_category, count(*) as acorn_categorycount FROM HiveSampleIn Group by acorn_category;

avatar
Explorer

Hi rtrivedi,

I'm about to test this out.

I will let you know how I get on.

Cheers

avatar
Explorer

Hi rtrivedi,

The fix you provided worked perfectly. Would it be possible to include the column headings in the output? At the moment, the output looks like the following:

\N,1
0,9
1,5159
2,694
3,4540
4,1975
5,2207
6,36

avatar
Explorer

Hi rtrivedi,

I added the additional code as suggested, but I get the following error:

org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ParseException line 2:0 cannot recognize input near 'set' 'hive' '.' in statement