Support Questions

Find answers, ask questions, and share your expertise

Hive-View does not display correctly UTF-8 characters

avatar
Explorer

Hi All,

I have some string text in UTF-8 in Hive tables. Querying using HiveQL from terminal or using SparkQL using spark-shell produces the text correctly, no encoding error:

>> select address from customerinfo limit 2; 
address 
---------------- 
Brahegatan 50 
Envägen 12 

However, when I query the data using Hive View in Ambari, the result are not display correctly.

select address from customerinfo limit 2; 
address 
---------------- 
Brahegatan 50 
Env?gen 12 

Similar issue happens when querying with Zeppelin using SparkQL.

Is this a bug in Hive view or Ambari? Do you know what is the cause of the problem and the solution?

BR,

/Nhan

8 REPLIES 8

avatar
Master Mentor

@Nhan Nguyen

Which version of ambari are you using?

Some older version of ambari (prior to Ambari 2.4) has some issue with multibyte characters that till cause such issues:

https://issues.apache.org/jira/browse/AMBARI-16713

avatar
Explorer

I am using HDP 2.6 with Ambari 2.5.

We have such problem both in Zeppelin and Hive-view.

avatar
Master Mentor

@Nhan Nguyen In HDP2.6 Ambari2.5 i can reproduce the issue.

Looks like the issue is from Hive Side: https://issues.apache.org/jira/browse/HIVE-15927

Example:

[root@sandbox hive-next-view]# su - hive
[hive@sandbox ~]$ beeline
Beeline version 1.2.1000.2.6.0.3-8 by Apache Hive
beeline> !connect jdbc:hive2://sandbox.hortonworks.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
Connecting to jdbc:hive2://sandbox.hortonworks.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
Enter username for jdbc:hive2://sandbox.hortonworks.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2: hive
Enter password for jdbc:hive2://sandbox.hortonworks.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2: ****
Connected to: Apache Hive (version 1.2.1000.2.6.0.3-8)
Driver: Hive JDBC (version 1.2.1000.2.6.0.3-8)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://sandbox.hortonworks.com:2181/> SELECT * FROM customerinfo;
+-----------------------+--+
| customerinfo.address  |
+-----------------------+--+
| Env�gen               |
+-----------------------+--+
1 row selected (0.255 seconds)

.

avatar
Expert Contributor

This is not a Hive issue rather a file system or file encoding issue. SELECT * in Hive actually does nothing except read the file from file system. So if you run a hadoop fs cat on your underlying file, you should see the same behavior.

You can check file encoding on bash as $ file -i filename

You can change the encoding using iconv. And convert to utf-8 which is printable encoding.

iconv -f current_encoding -t new_encoding input.file -o out.file

avatar
Explorer

We have checked and the encoding for the file is UTF-8. Please note that if we use hive command line interface to SELECT *, the text display correctly in the terminal. But in Hive View and Zeppelin, the text display wrong.

So, it might not be an issue of Hive but Hive view/Zeppelin.

avatar
Guru

@Nhan Nguyen

This seems more like a combination of encoding of the source/input file. So like @Jay SenSharma mention

0: jdbc:hive2://xlnode-2.h.c:2181,xlnode-3.h.> select * from abc_orc;
+---------------+--+
| abc_orc.col1  |
+---------------+--+
| Env�gen       |
+---------------+--+

We can check the file format of this file

-bash-4.1$ hdfs dfs -get /apps/hive/warehouse/abc/000000_0 .
-bash-4.1$ file 000000_0 
000000_0: ISO-8859 text

and like @Umair Khan stated, if we convert the encode, we can see the file accordingly

-bash-4.1$ iconv -f ISO-8859-1 -t UTF-8//TRANSLIT 000000_0 -o 000000_1
-bash-4.1$ file 000000_1
000000_1: UTF-8 Unicode text
-bash-4.1$ 
-bash-4.1$ 
-bash-4.1$ hdfs dfs -put 000000_1 /apps/hive/warehouse/abc/
-bash-4.1$ beeline -u "jdbc:hive2://xlnode-2.h.c:2181,xlnode-3.h.c:2181,xlnode-1.h.c:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" -n hive -p ''
Connecting to jdbc:hive2://xlnode-2.h.c:2181,xlnode-3.h.c:2181,xlnode-1.h.c:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
Connected to: Apache Hive (version 1.2.1000.2.6.0.3-8)
Driver: Hive JDBC (version 1.2.1000.2.6.0.3-8)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.6.0.3-8 by Apache Hive
0: jdbc:hive2://xlnode-2.h.c:2181,xlnode-3.h.> select * from abc;
+-----------+--+
| abc.col1  |
+-----------+--+
| Env�gen   |
| Envägen   |
+-----------+--+
2 rows selected (0.26 seconds)
0: jdbc:hive2://xlnode-2.h.c:2181,xlnode-3.h.> 

Can you try using a different browser, or if you are using chrome, can enable supporting all the encodings !! see if that works

avatar
Explorer

We have checked and the encoding for the file is UTF-8. Please note that if we use hive command line interface to SELECT *, the text display correctly in the terminal. But in Hive View and Zeppelin, the text display wrong.

I have also tested with different browser and different encodings, but Hive view just cannot rendered if not using Unicode UTF-8.

avatar
Explorer

I have made a test. I create a text file file content in Unicode UTF-8:

> cat test.csv
björn,alvägen
> file test.csv
test.csv: UTF-8 Unicode text

And create a table reading from that csv file;

hive> CREATE EXTERNAL TABLE test (
    >   column1 String,
    >   column2 String
    > )
    > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
    > LOCATION '/user/myuser/test/';
hive> select * from test;
OK
björn   alvägen
Time taken: 0.058 seconds, Fetched: 1 row(s)

But in HIVE VIEW:

test.column1  	test.column2

bj?rn		alv?gen

Similar issue happens for Zeppelin.

So, it is not a problem with encoding of source file @Jay SenSharma, @Umair Khan @Shyam Sunder Rai as I have source file in URF-8 and Hive can query data and display it in correct UTF-8. It must be an issue with HIVE VIEW and Zeppelin