Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NULL columns importing csv data into table

SOLVED Go to solution

NULL columns importing csv data into table

New Contributor

Hi everyone,

 

I'm trying to import a csv file to a table. But after I created the table and load the data into the table some columns (data types except STRING) is getting NULL. Here is the create table statement I used:

 

 

CREATE TABLE deneme6 (framenumber int,frametime TIMESTAMP, ipsrc STRING, ipdst STRING, protocol STRING, flag int, windowsize int, info STRING)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ','

STORED AS TEXTFILE ;

 

And then I'm loading my data file:

 

load data inpath 'hdfs:///user/hive/warehouse/deneme4/deneme4.csv' into table deneme6;

But for the columns framenumber,  frametime, flag, windowsize data is returning NULL. These are the columns that their data type are not STRING. What can I do for the issue? And here is an example of the csv file:

 

frame.number,frame.time_relative,ip.src,ip.dst,_ws.col.Protocol,tcp.flags,tcp.window_size_value,_ws.col.Info 

1,"0.000000000","147.32.84.165","91.212.135.158","TCP","0x00000002","64240","1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1"

2,"0.000009000","147.32.84.165","91.212.135.158","TCP","0x00000002","64240","[TCP Out-Of-Order] 1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1"

3,"0.062970000","91.212.135.158","147.32.84.165","TCP","0x00000012","65535","5678 → 1040 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 SACK_PERM=1"

 

 

And this is the result for the table:

 

NULL NULL "147.32.84.165" "91.212.135.158" "TCP" NULL NULL "1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1"
NULL NULL "147.32.84.165" "91.212.135.158" "TCP" NULL NULL "[TCP Out-Of-Order] 1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1"

NULL NULL "91.212.135.158" "147.32.84.165" "TCP" NULL NULL "5678 → 1040 [SYN

 

2 ACCEPTED SOLUTIONS

Accepted Solutions

Re: NULL columns importing csv data into table

Guru

Hi,

 

I tested, it works for me, at least the first column returned correctly, compare with yours. My result below:

 

+----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+
| deneme6.framenumber  | deneme6.frametime  |   deneme6.ipsrc   |   deneme6.ipdst   | deneme6.protocol  | deneme6.flag  | deneme6.windowsize  |                    deneme6.info                    |
+----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+
| 1                    | NULL               | "147.32.84.165"   | "91.212.135.158"  | "TCP"             | NULL          | NULL                | "1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1" |
| 2                    | NULL               | "147.32.84.165"   | "91.212.135.158"  | "TCP"             | NULL          | NULL                | "[TCP Out-Of-Order] 1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1" |
| 3                    | NULL               | "91.212.135.158"  | "147.32.84.165"   | "TCP"             | NULL          | NULL                | "5678 → 1040 [SYN                                |
+----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+

The reason for the NULL values for frametime, flag and windowsize columns is because you define them as INT type, but you have double quotes around those numbers. Hive does not interpret quotes in the file, as it only sees text file, not CSV file. Suggestion is that you remove all quotes in the file and try again, so that Hive can convert those numbers to INT correctly.

Highlighted

Re: NULL columns importing csv data into table

New Contributor
Thanks a lot.
5 REPLIES 5

Re: NULL columns importing csv data into table

Guru
Hi,

Can you please share the version of Hive or CDH you are using? So that I can try to see if I can re-produce?

Thanks

Re: NULL columns importing csv data into table

New Contributor

Clouder Quickstart 5.13 on VM.

Re: NULL columns importing csv data into table

New Contributor

Clouder Quickstart 5.13 on VM.

Re: NULL columns importing csv data into table

Guru

Hi,

 

I tested, it works for me, at least the first column returned correctly, compare with yours. My result below:

 

+----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+
| deneme6.framenumber  | deneme6.frametime  |   deneme6.ipsrc   |   deneme6.ipdst   | deneme6.protocol  | deneme6.flag  | deneme6.windowsize  |                    deneme6.info                    |
+----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+
| 1                    | NULL               | "147.32.84.165"   | "91.212.135.158"  | "TCP"             | NULL          | NULL                | "1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1" |
| 2                    | NULL               | "147.32.84.165"   | "91.212.135.158"  | "TCP"             | NULL          | NULL                | "[TCP Out-Of-Order] 1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1" |
| 3                    | NULL               | "91.212.135.158"  | "147.32.84.165"   | "TCP"             | NULL          | NULL                | "5678 → 1040 [SYN                                |
+----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+

The reason for the NULL values for frametime, flag and windowsize columns is because you define them as INT type, but you have double quotes around those numbers. Hive does not interpret quotes in the file, as it only sees text file, not CSV file. Suggestion is that you remove all quotes in the file and try again, so that Hive can convert those numbers to INT correctly.

Highlighted

Re: NULL columns importing csv data into table

New Contributor
Thanks a lot.