Created on 08-20-2013 06:59 AM - edited 09-16-2022 01:47 AM
Hi,
I'm trying to implement a simple data analytic use case.
I put in the cluster files with random number of rows similar to the following (fields could be populated randomly):
--;1374487936469;'';;;;;;;;;;;0.24516626;-0.85808194;7.4775715;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Then I create a Hive external table w/ the following command:
CREATE EXTERNAL TABLE myTable (f1 STRING, f2 STRING, f3 STRING, f4 STRING, <otherFields> )
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\U003B'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/test/2013-07_08';
In '/user/test/2013-07_08' there is at least a file w/ rows similar to the one described above.
Trying to select a bunch of rows w/: select * from myTable limit 10;
I obtain a resultset of 10 rows where each row contains a complete line of the file addressed in column f1.
" FIELDS TERMINATED BY '\U003B' " doesn't work, but even w/ '\u037E, ';' (this raises a stringLiteral error), '\059'.
Seems a simple problem but I can't realize which, and... I can't change the separator.
Have you any advice?
Thanks
Rob
Created 08-22-2013 12:28 AM
Thanks for the answer,
I've tried it but faced another problem probably related to the positioning of the jar.
I've put it in the folder /usr/lib/hive/lib on all the datanode and even on the namenode and secnamenode (just to cover all the basis).
Even tried to load it by Hue (query editor) but nothing, maybe is a problem of my cluster (prototyping one).
Actually I've solved using
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\073'
I did forget that octal representation of escapes.
Anyway thank you for your time
Rob
Created 08-20-2013 07:51 AM
Use Hive SerDe for CSV
https://github.com/ogrodnek/csv-serde
Also, refer
https://issues.apache.org/jira/browse/HIVE-136
Created 08-22-2013 12:28 AM
Thanks for the answer,
I've tried it but faced another problem probably related to the positioning of the jar.
I've put it in the folder /usr/lib/hive/lib on all the datanode and even on the namenode and secnamenode (just to cover all the basis).
Even tried to load it by Hue (query editor) but nothing, maybe is a problem of my cluster (prototyping one).
Actually I've solved using
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\073'
I did forget that octal representation of escapes.
Anyway thank you for your time
Rob
Created 08-20-2013 10:00 AM
The good thing is that you created an external table, so you can just delete the table and recreate it. The underlying data in HDFS (/user/test/...) wouldn't be deleted.