Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NULL columns importing csv data into table

avatar
Explorer

Hi everyone,

 

I'm trying to import a csv file to a table. But after I created the table and load the data into the table some columns (data types except STRING) is getting NULL. Here is the create table statement I used:

 

 

CREATE TABLE deneme6 (framenumber int,frametime TIMESTAMP, ipsrc STRING, ipdst STRING, protocol STRING, flag int, windowsize int, info STRING)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ','

STORED AS TEXTFILE ;

 

And then I'm loading my data file:

 

load data inpath 'hdfs:///user/hive/warehouse/deneme4/deneme4.csv' into table deneme6;

But for the columns framenumber,  frametime, flag, windowsize data is returning NULL. These are the columns that their data type are not STRING. What can I do for the issue? And here is an example of the csv file:

 

frame.number,frame.time_relative,ip.src,ip.dst,_ws.col.Protocol,tcp.flags,tcp.window_size_value,_ws.col.Info 

1,"0.000000000","147.32.84.165","91.212.135.158","TCP","0x00000002","64240","1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1"

2,"0.000009000","147.32.84.165","91.212.135.158","TCP","0x00000002","64240","[TCP Out-Of-Order] 1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1"

3,"0.062970000","91.212.135.158","147.32.84.165","TCP","0x00000012","65535","5678 → 1040 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 SACK_PERM=1"

 

 

And this is the result for the table:

 

NULL NULL "147.32.84.165" "91.212.135.158" "TCP" NULL NULL "1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1"
NULL NULL "147.32.84.165" "91.212.135.158" "TCP" NULL NULL "[TCP Out-Of-Order] 1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1"

NULL NULL "91.212.135.158" "147.32.84.165" "TCP" NULL NULL "5678 → 1040 [SYN

 

2 ACCEPTED SOLUTIONS

avatar
Super Guru

Hi,

 

I tested, it works for me, at least the first column returned correctly, compare with yours. My result below:

 

+----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+
| deneme6.framenumber  | deneme6.frametime  |   deneme6.ipsrc   |   deneme6.ipdst   | deneme6.protocol  | deneme6.flag  | deneme6.windowsize  |                    deneme6.info                    |
+----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+
| 1                    | NULL               | "147.32.84.165"   | "91.212.135.158"  | "TCP"             | NULL          | NULL                | "1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1" |
| 2                    | NULL               | "147.32.84.165"   | "91.212.135.158"  | "TCP"             | NULL          | NULL                | "[TCP Out-Of-Order] 1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1" |
| 3                    | NULL               | "91.212.135.158"  | "147.32.84.165"   | "TCP"             | NULL          | NULL                | "5678 → 1040 [SYN                                |
+----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+

The reason for the NULL values for frametime, flag and windowsize columns is because you define them as INT type, but you have double quotes around those numbers. Hive does not interpret quotes in the file, as it only sees text file, not CSV file. Suggestion is that you remove all quotes in the file and try again, so that Hive can convert those numbers to INT correctly.

View solution in original post

avatar
Explorer
5 REPLIES 5

avatar
Super Guru
Hi,

Can you please share the version of Hive or CDH you are using? So that I can try to see if I can re-produce?

Thanks

avatar
Explorer

Clouder Quickstart 5.13 on VM.

avatar
Explorer

Clouder Quickstart 5.13 on VM.

avatar
Super Guru

Hi,

 

I tested, it works for me, at least the first column returned correctly, compare with yours. My result below:

 

+----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+
| deneme6.framenumber  | deneme6.frametime  |   deneme6.ipsrc   |   deneme6.ipdst   | deneme6.protocol  | deneme6.flag  | deneme6.windowsize  |                    deneme6.info                    |
+----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+
| 1                    | NULL               | "147.32.84.165"   | "91.212.135.158"  | "TCP"             | NULL          | NULL                | "1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1" |
| 2                    | NULL               | "147.32.84.165"   | "91.212.135.158"  | "TCP"             | NULL          | NULL                | "[TCP Out-Of-Order] 1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1" |
| 3                    | NULL               | "91.212.135.158"  | "147.32.84.165"   | "TCP"             | NULL          | NULL                | "5678 → 1040 [SYN                                |
+----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+

The reason for the NULL values for frametime, flag and windowsize columns is because you define them as INT type, but you have double quotes around those numbers. Hive does not interpret quotes in the file, as it only sees text file, not CSV file. Suggestion is that you remove all quotes in the file and try again, so that Hive can convert those numbers to INT correctly.

avatar
Explorer
Thanks a lot.