Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

csv to hive, data not correctly imported to column

csv to hive, data not correctly imported to column

Rising Star

Hi all,

 

I have created a table with the required columns in hive and stored as textfile.

 

I use load data local inpath to put the csv file to the hive table created above, but all the data imported to 1 column, the rest of the columns are empty, I open the csv file in notepad, excel, seems like everything in order, I have use --hive-drop-delims and --fields-terminated-by '\t', the result is still the same.

 

Anyone tried this before?

 

 

8 REPLIES 8

Re: csv to hive, data not correctly imported to column

Master Guru
What is your CREATE TABLE statement? Are you specifying the right field delimiter character (\t if you are using that)? The default delimiter otherwise is ^A, which your data likely does not carry.

Re: csv to hive, data not correctly imported to column

Rising Star

Hi Harsh J,

 

I use the following command

 

>create table if not exists test(a string, b string, c string, d string, e string) row format delimited fields terminated by '\t' escaped by '\\';

>load data local inpath '/path/file.csv' into table testdraft --hive-drop-import-delims;

 

I have tried '\r', '\n' and '\001' at the row format fields terminated by statement but all give me same result.

 

I open the csv file using notepad the columns are seperated by tab, gedit will recognise the tab as space, when I select * test in hive, I saw a lot of null between the columns.

 

I also tried the following by replacing row format fields terminated by '\N'

create table if not exists test(a string, b string, c string, d string, e string) row format delimited fields terminated by '\N';

 

then the column will be seperated by those words with n in it.

 

I am pretty new with this csv, hive and DB, please shed some light.

 

Thanks!

Re: csv to hive, data not correctly imported to column

Master Guru
The first CREATE TABLE specification looks correct to me, for your described file.

Can you also double-inspect your /path/file.csv with "head -n1 /path/file.csv | od -c" command to ensure it does have the actual \t character between each field (vs. using a visual editor)?

Re: csv to hive, data not correctly imported to column

Rising Star

Hi Harsh J,

 

When I use the command

#head -n1 file.csv | od -c

 

It shows the following:


0000000 377 376 1 \0 \0 \0 \0 \0 \0 \0
0000020 \0 \0 \0 \0 \0 1 \0 6 \0 1 \0
0000040 8 \0 9 \0 7 \0 \0 \0 \0 \0 \0
0000060 \0 I \0 L \0 L \0 E \0 G \0 A \0 L \0
0000100 \0 D \0 U \0 M \0 P \0 \0 S \0 I \0
0000120 T \0 E \0 \0 ( \0 M \0 P \0 S \0 P \0
0000140 \0 W \0 A \0 T \0 C \0 H \0 \0 N \0
0000160 O \0 . \0 R \0 U \0 J \0 U \0 K \0 A \0
0000200 N \0 \0 1 \0 4 \0 / \0 0 \0 5 \0 / \0
0000220 1 \0 3 \0 3 \0 9 \0 ) \0 \0 A \0 d \0
0000240 a \0 l \0 a \0 h \0 \0 d \0 i \0 m \0
0000260 a \0 k \0 l \0 u \0 m \0 k \0 a \0 n \0
0000300 \0 b \0 a \0 h \0 a \0 w \0 a \0 \0
0000320 s \0 i \0 a \0 s \0 a \0 t \0 a \0 n \0
0000340 \0 M \0 P \0 S \0 P \0 \0 m \0 e \0
0000360 n \0 d \0 a \0 p \0 a \0 t \0 i \0 \0
0000400 k \0 a \0 w \0 a \0 s \0 a \0 n \0 \0
0000420 t \0 e \0 r \0 s \0 e \0 b \0 u \0 t \0
0000440 \0 m \0 e \0 r \0 u \0 p \0 a \0 k \0
0000460 a \0 n \0 \0 t \0 a \0 n \0 a \0 h \0
0000500 \0 h \0 a \0 k \0 \0 m \0 i \0 l \0
0000520 i \0 k \0 \0 p \0 e \0 r \0 s \0 e \0
0000540 n \0 d \0 i \0 r \0 i \0 a \0 n \0 . \0
0000560 \0 W \0 a \0 l \0 a \0 u \0 \0 b \0
0000600 a \0 g \0 a \0 i \0 m \0 a \0 n \0 a \0
0000620 p \0 u \0 n \0 , \0 \0 t \0 i \0 n \0
0000640 d \0 a \0 k \0 a \0 n \0 \0 t \0 e \0
0000660 l \0 a \0 h \0 \0 d \0 i \0 a \0 m \0
0000700 b \0 i \0 l \0 \0 d \0 e \0 n \0 g \0
0000720 a \0 n \0 \0 m \0 e \0 n \0 g \0 e \0
0000740 l \0 u \0 a \0 r \0 k \0 a \0 n \0 \0
0000760 s \0 a \0 t \0 u \0 \0 ( \0 1 \0 ) \0
0001000 \0 N \0 o \0 t \0 i \0 s \0 \0 d \0
0001020 i \0 \0 b \0 a \0 w \0 a \0 h \0 \0
0001040 U \0 n \0 d \0 a \0 n \0 g \0 - \0 U \0
0001060 n \0 \0 C \0 \0 \0 \0 \0 \0
0001100 \0 \0 \0 \0 \0 \0 H \0 A \0
0001120 F \0 I \0 Z \0 A \0 \0 \0 \0 \0
0001140 \0 \0 \0 \0 \0 \0 \0 \0
*
0001420 \0 \0 \0 2 \0 0 \0 1 \0 4 \0 - \0
0001440 0 \0 6 \0 - \0 0 \0 3 \0 \0 1 \0 6 \0
0001460 : \0 2 \0 5 \0 : \0 0 \0 0 \0 . \0 0 \0
0001500 0 \0 0 \0 \0 H \0 A \0 F \0 I \0 Z \0
0001520 A \0 \0 \0 \0 \0 \0 \0 \0
0001540 \0 \0 \0 \0 \0 \0 \0 \0
*
0002020 2 \0 0 \0 1 \0 4 \0 - \0 0 \0 6 \0 - \0
0002040 0 \0 5 \0 \0 1 \0 4 \0 : \0 2 \0 0 \0
0002060 : \0 4 \0 4 \0 . \0 0 \0 0 \0 0 \0 \r \0
0002100 \n
0002101

 

Thanks!

Re: csv to hive, data not correctly imported to column

Master Guru
I don't see the tab character. There are lots of interleaved null
characters in the file though, and the closest I can guess your delimiter
to be is a double null byte sequence: \0\0. You may want to format the file
right before use with Hive, if this is so.

Something like the below in Python can do the trick for cleanup, for
example, assuming your delimiter is indeed a double sequence of null bytes:

data = data.replace('\0\0', '\t').replace('\0', '')

Re: csv to hive, data not correctly imported to column

Rising Star

Thanks Harsh J,

 

Will try that out!

Re: csv to hive, data not correctly imported to column

New Contributor

I have written a tool to auto generate hive scritps from csv, its written in Java and its open source, you can check at

https://sourceforge.net/projects/csvtohive/?source=directory

Re: csv to hive, data not correctly imported to column

Rising Star
Thanks bodivijay! I have use another way to import the hive, will try your tools when I have time to revisit the same issue!