Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Why hive is not able to store special characters like 'á' ?

Why hive is not able to store special characters like 'á' ?

I have created a table as following but after inserting data in the table it shows junk characters upon retrial when it has special characters:

Example:

create database TestDB_C;

use TestDB_C;

CREATE TABLE IF NOT EXISTS customerA ( custId int, custName String, city String)
COMMENT 'Customer details'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

insert into customerA values (1000, "CustáOne", "BLR");

select * from customerA;

Notice the data "CustáOne" contains special character. I also tried using the following approach with no success:

 ALTER TABLE customera SET serdeproperties ('serialization.encoding'='UTF-8');
6 REPLIES 6

Re: Why hive is not able to store special characters like 'á' ?

@jk

Try specifying the full Serde definition (although what you tried should work):

ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe’ WITH SERDEPROPERTIES(“serialization.encoding”=’UTF-8′);

Re: Why hive is not able to store special characters like 'á' ?

Guru

@Michael Young

I am not sure if specifying "ROW FORMAT SERDE" helps as once we create the table, the formatted description for this table shows the following:

SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

It is already using the LazySimpleSerDe format.

@jk, Can you query the same data via a different tool like Excel or something else and check if you still see the same junk character ?

Re: Why hive is not able to store special characters like 'á' ?

@Michael Young the "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe" approach did not work.

@srai I tried beeline, hive shell, hive view as well but no change in the result.

What i want to know is a simple step to insert a simple data with special character like "CustáOne" ... Can this data be inserted to a very simple hive table? What all step do i need to follow to do that.

Re: Why hive is not able to store special characters like 'á' ?

Can we store the simple data with special character "CustáOne" in hive database? And retrieve the data in the same format (without junk characters)?

Re: Why hive is not able to store special characters like 'á' ?

Yes, you can store UTF-8 characters in Hive tables and retrieve them. I have tested this with both Left-to-Right and Right-to-Left languages without any problems before. You typically don't have to do anything special to get it to work.

Try this:

1. Create a delimited text file with a couple of rows of data (including UTF-8 characters).  You can use \t as the delimiter.
2. Make sure you save the file as an UTF-8 text file and push it to HDFS.
3. Create an external table in Hive that points to the directory where you placed that file.
4. Run the same query as before to see if the data is displayed correctly.

I'm wondering if there is something happening in the environment when you do the insert. Did you do the insert from the command line or did you use the Hive view to do it. From the command line, try setting your environment using before running hive or beeline.

export LANG=en_US.UTF-8

Re: Why hive is not able to store special characters like 'á' ?

For command line input, as Michael said, check your LANG settings, it should end in "UTF-8" but it doesn't have to be "en_US", for example for Japanese it's "ja_JP.UTF-8". As for the Hive view, CVJK languages don't work and I'm afraid Europian umlauts don't work either. The fix is coming in Ambari-2.4.

Don't have an account?
Coming from Hortonworks? Activate your account here