Reply
uur
New Contributor
Posts: 4
Registered: ‎06-23-2018
Accepted Solution

NULL columns importing csv data into table

Hi everyone,

 

I'm trying to import a csv file to a table. But after I created the table and load the data into the table some columns (data types except STRING) is getting NULL. Here is the create table statement I used:

 

 

CREATE TABLE deneme6 (framenumber int,frametime TIMESTAMP, ipsrc STRING, ipdst STRING, protocol STRING, flag int, windowsize int, info STRING)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ','

STORED AS TEXTFILE ;

 

And then I'm loading my data file:

 

load data inpath 'hdfs:///user/hive/warehouse/deneme4/deneme4.csv' into table deneme6;

But for the columns framenumber,  frametime, flag, windowsize data is returning NULL. These are the columns that their data type are not STRING. What can I do for the issue? And here is an example of the csv file:

 

frame.number,frame.time_relative,ip.src,ip.dst,_ws.col.Protocol,tcp.flags,tcp.window_size_value,_ws.col.Info 

1,"0.000000000","147.32.84.165","91.212.135.158","TCP","0x00000002","64240","1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1"

2,"0.000009000","147.32.84.165","91.212.135.158","TCP","0x00000002","64240","[TCP Out-Of-Order] 1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1"

3,"0.062970000","91.212.135.158","147.32.84.165","TCP","0x00000012","65535","5678 → 1040 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 SACK_PERM=1"

 

 

And this is the result for the table:

 

NULL NULL "147.32.84.165" "91.212.135.158" "TCP" NULL NULL "1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1"
NULL NULL "147.32.84.165" "91.212.135.158" "TCP" NULL NULL "[TCP Out-Of-Order] 1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1"

NULL NULL "91.212.135.158" "147.32.84.165" "TCP" NULL NULL "5678 → 1040 [SYN

 

Cloudera Employee
Posts: 426
Registered: ‎03-23-2015

Re: NULL columns importing csv data into table

Hi,

Can you please share the version of Hive or CDH you are using? So that I can try to see if I can re-produce?

Thanks
uur
New Contributor
Posts: 4
Registered: ‎06-23-2018

Re: NULL columns importing csv data into table

Clouder Quickstart 5.13 on VM.

uur
New Contributor
Posts: 4
Registered: ‎06-23-2018

Re: NULL columns importing csv data into table

[ Edited ]

Clouder Quickstart 5.13 on VM.

Cloudera Employee
Posts: 426
Registered: ‎03-23-2015

Re: NULL columns importing csv data into table

Hi,

 

I tested, it works for me, at least the first column returned correctly, compare with yours. My result below:

 

+----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+
| deneme6.framenumber  | deneme6.frametime  |   deneme6.ipsrc   |   deneme6.ipdst   | deneme6.protocol  | deneme6.flag  | deneme6.windowsize  |                    deneme6.info                    |
+----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+
| 1                    | NULL               | "147.32.84.165"   | "91.212.135.158"  | "TCP"             | NULL          | NULL                | "1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1" |
| 2                    | NULL               | "147.32.84.165"   | "91.212.135.158"  | "TCP"             | NULL          | NULL                | "[TCP Out-Of-Order] 1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1" |
| 3                    | NULL               | "91.212.135.158"  | "147.32.84.165"   | "TCP"             | NULL          | NULL                | "5678 → 1040 [SYN                                |
+----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+

The reason for the NULL values for frametime, flag and windowsize columns is because you define them as INT type, but you have double quotes around those numbers. Hive does not interpret quotes in the file, as it only sees text file, not CSV file. Suggestion is that you remove all quotes in the file and try again, so that Hive can convert those numbers to INT correctly.

Highlighted
uur
New Contributor
Posts: 4
Registered: ‎06-23-2018

Re: NULL columns importing csv data into table

Thanks a lot.
Announcements