Support Questions
Find answers, ask questions, and share your expertise

Loading csv into Hive adds extra Null

New Contributor

Hello,

 

I am trying to load a csv file to a table in hive

I gave as below to create the table.

 

CREATE TABLE cp
(
ENRL_KEY String
,FMLY_KEY String
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar'=',',
'quoteChar'='"',
'escapeChar'='\\'
)
STORED AS TEXTFILE
LOCATION '/data/abc'
TBLPROPERTIES('skip.header.line.count'='1');

 

the output is : each record as a Null record.. so I get exactly double the count.

 

What am I missing here ?  tried with \t,  even line terminator \n etc... no luck.

 

entrl_key

fmly_key

 

NULL

2051012

2374248

 

NULL

29051020

2374248

 

NULL

29051035

2374248

3 REPLIES 3

Guru
What version of CDH or Hive are you using?

New Contributor

CDH 5.10.1 and Hive 0.14.

 

I guess the suspect is encoding.

The csv file I am using is a UTF-16 Little Endian and when I convert into UTF-8 it works fine. But that is not the solution I want as these files cannot be changed it comes from client. the only option is I need to do something during "Create External Table "  I tried all sort of things.

 

Lazyserde

opencsvserde

serialization.encoding = 'WINDOWS-1252' , UTF-16, UTF-16LE etc... nothing works..

 

Has anyone come across this kinda situation, pl let me know.

 

Thanks

A

New Contributor

I had faced this issue , Hive does not support UTF-16 directly. you have to use serde or you can covert to UTF-8

 

Use the Python code to convert to UTF-8 as below.

 

file1 = open("utf-8_file.csv","w")
with open('utf-16_file.csv', encoding='utf-16') as f:
for line in f:
file1.write(str(line))

; ;