Created on 06-23-2018 04:59 AM - edited 09-16-2022 06:22 AM
Hi everyone,
I'm trying to import a csv file to a table. But after I created the table and load the data into the table some columns (data types except STRING) is getting NULL. Here is the create table statement I used:
CREATE TABLE deneme6 (framenumber int,frametime TIMESTAMP, ipsrc STRING, ipdst STRING, protocol STRING, flag int, windowsize int, info STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE ;
And then I'm loading my data file:
load data inpath 'hdfs:///user/hive/warehouse/deneme4/deneme4.csv' into table deneme6;
But for the columns framenumber, frametime, flag, windowsize data is returning NULL. These are the columns that their data type are not STRING. What can I do for the issue? And here is an example of the csv file:
frame.number,frame.time_relative,ip.src,ip.dst,_ws.col.Protocol,tcp.flags,tcp.window_size_value,_ws.col.Info
1,"0.000000000","147.32.84.165","91.212.135.158","TCP","0x00000002","64240","1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1"
2,"0.000009000","147.32.84.165","91.212.135.158","TCP","0x00000002","64240","[TCP Out-Of-Order] 1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1"
3,"0.062970000","91.212.135.158","147.32.84.165","TCP","0x00000012","65535","5678 → 1040 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 SACK_PERM=1"
And this is the result for the table:
NULL NULL "147.32.84.165" "91.212.135.158" "TCP" NULL NULL "1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1"
NULL NULL "147.32.84.165" "91.212.135.158" "TCP" NULL NULL "[TCP Out-Of-Order] 1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1"
NULL NULL "91.212.135.158" "147.32.84.165" "TCP" NULL NULL "5678 → 1040 [SYN
Created 07-03-2018 03:45 PM
Hi,
I tested, it works for me, at least the first column returned correctly, compare with yours. My result below:
+----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+ | deneme6.framenumber | deneme6.frametime | deneme6.ipsrc | deneme6.ipdst | deneme6.protocol | deneme6.flag | deneme6.windowsize | deneme6.info | +----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+ | 1 | NULL | "147.32.84.165" | "91.212.135.158" | "TCP" | NULL | NULL | "1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1" | | 2 | NULL | "147.32.84.165" | "91.212.135.158" | "TCP" | NULL | NULL | "[TCP Out-Of-Order] 1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1" | | 3 | NULL | "91.212.135.158" | "147.32.84.165" | "TCP" | NULL | NULL | "5678 → 1040 [SYN | +----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+
The reason for the NULL values for frametime, flag and windowsize columns is because you define them as INT type, but you have double quotes around those numbers. Hive does not interpret quotes in the file, as it only sees text file, not CSV file. Suggestion is that you remove all quotes in the file and try again, so that Hive can convert those numbers to INT correctly.
Created 07-04-2018 01:19 AM
Created 06-30-2018 12:51 PM
Created 07-03-2018 06:17 AM
Clouder Quickstart 5.13 on VM.
Created on 07-03-2018 06:34 AM - edited 07-03-2018 06:43 AM
Clouder Quickstart 5.13 on VM.
Created 07-03-2018 03:45 PM
Hi,
I tested, it works for me, at least the first column returned correctly, compare with yours. My result below:
+----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+ | deneme6.framenumber | deneme6.frametime | deneme6.ipsrc | deneme6.ipdst | deneme6.protocol | deneme6.flag | deneme6.windowsize | deneme6.info | +----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+ | 1 | NULL | "147.32.84.165" | "91.212.135.158" | "TCP" | NULL | NULL | "1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1" | | 2 | NULL | "147.32.84.165" | "91.212.135.158" | "TCP" | NULL | NULL | "[TCP Out-Of-Order] 1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1" | | 3 | NULL | "91.212.135.158" | "147.32.84.165" | "TCP" | NULL | NULL | "5678 → 1040 [SYN | +----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+
The reason for the NULL values for frametime, flag and windowsize columns is because you define them as INT type, but you have double quotes around those numbers. Hive does not interpret quotes in the file, as it only sees text file, not CSV file. Suggestion is that you remove all quotes in the file and try again, so that Hive can convert those numbers to INT correctly.
Created 07-04-2018 01:19 AM