I am trying to load a csv file to a table in hive
I gave as below to create the table.
CREATE TABLE cp
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
STORED AS TEXTFILE
the output is : each record as a Null record.. so I get exactly double the count.
What am I missing here ? tried with \t, even line terminator \n etc... no luck.
CDH 5.10.1 and Hive 0.14.
I guess the suspect is encoding.
The csv file I am using is a UTF-16 Little Endian and when I convert into UTF-8 it works fine. But that is not the solution I want as these files cannot be changed it comes from client. the only option is I need to do something during "Create External Table " I tried all sort of things.
serialization.encoding = 'WINDOWS-1252' , UTF-16, UTF-16LE etc... nothing works..
Has anyone come across this kinda situation, pl let me know.
I had faced this issue , Hive does not support UTF-16 directly. you have to use serde or you can covert to UTF-8
Use the Python code to convert to UTF-8 as below.
file1 = open("utf-8_file.csv","w")
with open('utf-16_file.csv', encoding='utf-16') as f:
for line in f: