- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
NULL columns importing csv data into table
- Labels:
-
Apache Hive
Created on 06-23-2018 04:59 AM - edited 09-16-2022 06:22 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone,
I'm trying to import a csv file to a table. But after I created the table and load the data into the table some columns (data types except STRING) is getting NULL. Here is the create table statement I used:
CREATE TABLE deneme6 (framenumber int,frametime TIMESTAMP, ipsrc STRING, ipdst STRING, protocol STRING, flag int, windowsize int, info STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE ;
And then I'm loading my data file:
load data inpath 'hdfs:///user/hive/warehouse/deneme4/deneme4.csv' into table deneme6;
But for the columns framenumber, frametime, flag, windowsize data is returning NULL. These are the columns that their data type are not STRING. What can I do for the issue? And here is an example of the csv file:
frame.number,frame.time_relative,ip.src,ip.dst,_ws.col.Protocol,tcp.flags,tcp.window_size_value,_ws.col.Info
1,"0.000000000","147.32.84.165","91.212.135.158","TCP","0x00000002","64240","1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1"
2,"0.000009000","147.32.84.165","91.212.135.158","TCP","0x00000002","64240","[TCP Out-Of-Order] 1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1"
3,"0.062970000","91.212.135.158","147.32.84.165","TCP","0x00000012","65535","5678 → 1040 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 SACK_PERM=1"
And this is the result for the table:
NULL NULL "147.32.84.165" "91.212.135.158" "TCP" NULL NULL "1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1"
NULL NULL "147.32.84.165" "91.212.135.158" "TCP" NULL NULL "[TCP Out-Of-Order] 1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1"
NULL NULL "91.212.135.158" "147.32.84.165" "TCP" NULL NULL "5678 → 1040 [SYN
Created 07-03-2018 03:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I tested, it works for me, at least the first column returned correctly, compare with yours. My result below:
+----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+ | deneme6.framenumber | deneme6.frametime | deneme6.ipsrc | deneme6.ipdst | deneme6.protocol | deneme6.flag | deneme6.windowsize | deneme6.info | +----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+ | 1 | NULL | "147.32.84.165" | "91.212.135.158" | "TCP" | NULL | NULL | "1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1" | | 2 | NULL | "147.32.84.165" | "91.212.135.158" | "TCP" | NULL | NULL | "[TCP Out-Of-Order] 1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1" | | 3 | NULL | "91.212.135.158" | "147.32.84.165" | "TCP" | NULL | NULL | "5678 → 1040 [SYN | +----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+
The reason for the NULL values for frametime, flag and windowsize columns is because you define them as INT type, but you have double quotes around those numbers. Hive does not interpret quotes in the file, as it only sees text file, not CSV file. Suggestion is that you remove all quotes in the file and try again, so that Hive can convert those numbers to INT correctly.
Created 07-04-2018 01:19 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created 06-30-2018 12:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you please share the version of Hive or CDH you are using? So that I can try to see if I can re-produce?
Thanks
Created 07-03-2018 06:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Clouder Quickstart 5.13 on VM.
Created on 07-03-2018 06:34 AM - edited 07-03-2018 06:43 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Clouder Quickstart 5.13 on VM.
Created 07-03-2018 03:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I tested, it works for me, at least the first column returned correctly, compare with yours. My result below:
+----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+ | deneme6.framenumber | deneme6.frametime | deneme6.ipsrc | deneme6.ipdst | deneme6.protocol | deneme6.flag | deneme6.windowsize | deneme6.info | +----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+ | 1 | NULL | "147.32.84.165" | "91.212.135.158" | "TCP" | NULL | NULL | "1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1" | | 2 | NULL | "147.32.84.165" | "91.212.135.158" | "TCP" | NULL | NULL | "[TCP Out-Of-Order] 1040 → 5678 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1" | | 3 | NULL | "91.212.135.158" | "147.32.84.165" | "TCP" | NULL | NULL | "5678 → 1040 [SYN | +----------------------+--------------------+-------------------+-------------------+-------------------+---------------+---------------------+----------------------------------------------------+--+
The reason for the NULL values for frametime, flag and windowsize columns is because you define them as INT type, but you have double quotes around those numbers. Hive does not interpret quotes in the file, as it only sees text file, not CSV file. Suggestion is that you remove all quotes in the file and try again, so that Hive can convert those numbers to INT correctly.
Created 07-04-2018 01:19 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
