Created 10-12-2017 11:46 AM
Hello,
I am trying to load a csv file to a table in hive
I gave as below to create the table.
CREATE TABLE cp
(
ENRL_KEY String
,FMLY_KEY String
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar'=',',
'quoteChar'='"',
'escapeChar'='\\'
)
STORED AS TEXTFILE
LOCATION '/data/abc'
TBLPROPERTIES('skip.header.line.count'='1');
the output is : each record as a Null record.. so I get exactly double the count.
What am I missing here ? tried with \t, even line terminator \n etc... no luck.
entrl_key | fmly_key |
| NULL |
2051012 | 2374248 |
| NULL |
29051020 | 2374248 |
| NULL |
29051035 | 2374248 |
Created 10-18-2017 04:23 AM
Created 10-21-2017 02:35 PM
CDH 5.10.1 and Hive 0.14.
I guess the suspect is encoding.
The csv file I am using is a UTF-16 Little Endian and when I convert into UTF-8 it works fine. But that is not the solution I want as these files cannot be changed it comes from client. the only option is I need to do something during "Create External Table " I tried all sort of things.
Lazyserde
opencsvserde
serialization.encoding = 'WINDOWS-1252' , UTF-16, UTF-16LE etc... nothing works..
Has anyone come across this kinda situation, pl let me know.
Thanks
A
Created on 04-03-2019 07:31 PM - edited 04-03-2019 08:37 PM
I had faced this issue , Hive does not support UTF-16 directly. you have to use serde or you can covert to UTF-8
Use the Python code to convert to UTF-8 as below.
file1 = open("utf-8_file.csv","w")
with open('utf-16_file.csv', encoding='utf-16') as f:
for line in f:
file1.write(str(line))