Created 01-29-2018 10:37 AM
Hello Community,
The Hive script I have created keeps throwing the following error:
Time taken: 2.634 seconds FAILED: ParseException line 17:2 missing EOF at 'COLUMN' near ')' 18/01/29 10:29:53 [main]: ERROR ql.Driver: FAILED: ParseException line 17:2 missing EOF at 'COLUMN' near ')' org.apache.hadoop.hive.ql.parse.ParseException: line 17:2 missing EOF at 'COLUMN' near ')'
Can someone please take a look at the Hive script and let me know where I might be going wrong?
DROP TABLE IF EXISTS HiveSampleIn; CREATE EXTERNAL TABLE HiveSampleIn ( anonid int, eprofileclass int, fueltypes STRING, acorn_category int, acorn_group STRING, acorn_type int, nuts4 STRING, lacode STRING, nuts1 STRING, gspgroup STRING, ldz STRING, gas_elec STRING, gas_tout STRING ) COLUMN FORMAT DELIMITED FIELDS TERMINATED BY (',') LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/samplein/'; DROP TABLE IF EXISTS HiveSampleOut; CREATE EXTERNAL TABLE HiveSampleOut ( acorn_category int ) COLUMN FORMAT DELIMITED FIELDS TERMINATED BY (',') LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/'; INSERT OVERWRITE TABLE HiveSampleOut Select acorn_category, count(*) as acorn_categorycount FROM HiveSampleIn Group by acorn_category
Cheers
Carlton
Created 01-29-2018 11:16 AM
Can you try setting it to ROW FORMAT DELIMITED instead of COLUMN FORMAT DELIMITED and try running the query.
Thanks,
Aditya
Created 01-29-2018 11:41 AM
Aditya,
Thanks for getting in touch,
I change script to the following:
DROP TABLE IF EXISTS HiveSampleIn; CREATE EXTERNAL TABLE HiveSampleIn ( anonid int, eprofileclass int, fueltypes STRING, acorn_category int, acorn_group STRING, acorn_type int, nuts4 STRING, lacode STRING, nuts1 STRING, gspgroup STRING, ldz STRING, gas_elec STRING, gas_tout STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION ' wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/samplein/'; TBLPROPERTIES ("skip.header.line.count" = "1"); DROP TABLE IF EXISTS HiveSampleOut; CREATE EXTERNAL TABLE HiveSampleOut ( acorn_category int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION ' wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/'; TBLPROPERTIES ("skip.header.line.count" = "1"); INSERT OVERWRITE TABLE HiveSampleOut Select acorn_category, count(*) as acorn_categorycount FROM HiveSampleIn Group by acorn_category
But I'm now getting the following errors:
Time taken: 2.684 seconds FAILED: ParseException line 18:46 character '<EOF>' not supported here 18/01/29 11:37:24 [main]: ERROR ql.Driver: FAILED: ParseException line 18:46 character '<EOF>' not supported here org.apache.hadoop.hive.ql.parse.ParseException: line 18:46 character '<EOF>' not supported here
Created 01-29-2018 12:21 PM
Hi Aditya,
I tried your suggestion, but now I'm getting the following error:
FAILED: IllegalArgumentException java.net.URISyntaxException: Illegal character in scheme name at index 0: wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/samplein 18/01/29 12:18:20 [main]: ERROR ql.Driver: FAILED: IllegalArgumentException java.net.URISyntaxException: Illegal character in scheme name at index 0: wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/samplein java.lang.IllegalArgumentException: java.net.URISyntaxException: Illegal character in scheme name at index 0: wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/samplein
The new code looks like the following:
DROP TABLE IF EXISTS HiveSampleIn; CREATE EXTERNAL TABLE HiveSampleIn ( anonid int, eprofileclass int, fueltypes STRING, acorn_category int, acorn_group STRING, acorn_type int, nuts4 STRING, lacode STRING, nuts1 STRING, gspgroup STRING, ldz STRING, gas_elec STRING, gas_tout STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION ' wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/samplein/' TBLPROPERTIES ("skip.header.line.count" = "1"); DROP TABLE IF EXISTS HiveSampleOut; CREATE EXTERNAL TABLE HiveSampleOut ( acorn_category int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION ' wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/' TBLPROPERTIES ("skip.header.line.count" = "1"); INSERT OVERWRITE TABLE HiveSampleOut Select acorn_category, count(*) as acorn_categorycount FROM HiveSampleIn Group by acorn_category
Created 01-29-2018 03:38 PM
Hello Community,
I have updated the script as follows:
DROP TABLE IF EXISTS HiveSampleIn; CREATE EXTERNAL TABLE HiveSampleIn ( anonid int, eprofileclass int, fueltypes STRING, acorn_category int, acorn_group STRING, acorn_type int, nuts4 STRING, lacode STRING, nuts1 STRING, gspgroup STRING, ldz STRING, gas_elec STRING, gas_tout STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/samplein/' TBLPROPERTIES ("skip.header.line.count" = "1"); DROP TABLE IF EXISTS HiveSampleOut; CREATE EXTERNAL TABLE HiveSampleOut ( acorn_category int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/' INSERT OVERWRITE TABLE HiveSampleOut Select acorn_category FROM HiveSampleIn Group by acorn_category TBLPROPERTIES ("skip.header.line.count" = "1");
However, I'm getting the following error message:
Time taken: 0.499 seconds FAILED: ParseException line 7:0 missing EOF at 'INSERT' near ''wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/'' 18/01/29 15:34:09 [main]: ERROR ql.Driver: FAILED: ParseException line 7:0 missing EOF at 'INSERT' near ''wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/'' org.apache.hadoop.hive.ql.parse.ParseException: line 7:0 missing EOF at 'INSERT' near ''wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/''
Any help will be greatly appreciated.
Cheers
Created 01-29-2018 10:10 PM
Hello Community,
While waiting for some assistance, I have created the the following HQL script, but I'm still have the problem:
DROP TABLE IF EXISTS HiveSampleIn; CREATE EXTERNAL TABLE HiveSampleIn ( anonid INT, eprofileclass INT, fueltypes STRING, acorn_category INT, acorn_group STRING, acorn_type INT, nuts4 STRING, lacode STRING, nuts1 STRING, gspgroup STRING, ldz STRING, gas_elec STRING, gas_tout STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/samplein/'; tblproperties ("skip.header.line.count"="2"); DROP TABLE IF EXISTS HiveSampleOut; CREATE EXTERNAL TABLE HiveSampleOut ( acorn_category int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/'; INSERT OVERWRITE TABLE HiveSampleOut Select acorn_category FROM HiveSampleIn Group by acorn_category;
I'm getting the following error:
Time taken: 2.663 seconds FAILED: ParseException line 21:44 missing EOF at ',' near ')' 18/01/29 21:37:15 [main]: ERROR ql.Driver: FAILED: ParseException line 21:44 missing EOF at ',' near ')' org.apache.hadoop.hive.ql.parse.ParseException: line 21:44 missing EOF at ',' near ')'
Any help with this will be greatly appreciated.
Cheers
Carlton
Created 01-30-2018 09:41 AM
CREATE EXTERNAL TABLE HiveSampleIn
...
LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/samplein/'; -- Getting terminated at here
tblproperties ("skip.header.line.count"="2"); -- This might be throwing ParseException
Created 01-30-2018 11:17 AM
Hi Naresh,
Thanks for getting in touch.
Can you think of any reason why tblproperties ("skip.header.line.count"="2"); is throwing the error? And any suggestions on how to fix it?
Much appreciated.
Carlton
Created 01-30-2018 11:24 AM
pls try to use below command for creating HiveSampleIn Table
CREATE EXTERNAL TABLE HiveSampleIn ( anonid INT, eprofileclass INT, fueltypes STRING, acorn_category INT, acorn_group STRING, acorn_type INT, nuts4 STRING, lacode STRING, nuts1 STRING, gspgroup STRING, ldz STRING, gas_elec STRING, gas_tout STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/samplein/' tblproperties ("skip.header.line.count"="2");
Created 06-14-2018 07:26 AM
This could be because you are parsing actual data in the place of header,supposing your first row has header and second row onwards has data.
Hence it can't parse data(int, string) as header(string).
So try changing it to ("skip.header.line.count"="1");
Hope this helps.