Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Loading csv into Hive adds extra Null

Loading csv into Hive adds extra Null

New Contributor

Hello,

 

I am trying to load a csv file to a table in hive

I gave as below to create the table.

 

CREATE TABLE cp
(
ENRL_KEY String
,FMLY_KEY String
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar'=',',
'quoteChar'='"',
'escapeChar'='\\'
)
STORED AS TEXTFILE
LOCATION '/data/abc'
TBLPROPERTIES('skip.header.line.count'='1');

 

the output is : each record as a Null record.. so I get exactly double the count.

 

What am I missing here ?  tried with \t,  even line terminator \n etc... no luck.

 

entrl_key

fmly_key

 

NULL

2051012

2374248

 

NULL

29051020

2374248

 

NULL

29051035

2374248

3 REPLIES 3

Re: Loading csv into Hive adds extra Null

Guru
What version of CDH or Hive are you using?

Re: Loading csv into Hive adds extra Null

New Contributor

CDH 5.10.1 and Hive 0.14.

 

I guess the suspect is encoding.

The csv file I am using is a UTF-16 Little Endian and when I convert into UTF-8 it works fine. But that is not the solution I want as these files cannot be changed it comes from client. the only option is I need to do something during "Create External Table "  I tried all sort of things.

 

Lazyserde

opencsvserde

serialization.encoding = 'WINDOWS-1252' , UTF-16, UTF-16LE etc... nothing works..

 

Has anyone come across this kinda situation, pl let me know.

 

Thanks

A

Re: Loading csv into Hive adds extra Null

New Contributor

I had faced this issue , Hive does not support UTF-16 directly. you have to use serde or you can covert to UTF-8

 

Use the Python code to convert to UTF-8 as below.

 

file1 = open("utf-8_file.csv","w")
with open('utf-16_file.csv', encoding='utf-16') as f:
for line in f:
file1.write(str(line))