Support Questions

nhan_nguyen · ‎05-05-2017

Hi All,

I have some string text in UTF-8 in Hive tables. Querying using HiveQL from terminal or using SparkQL using spark-shell produces the text correctly, no encoding error:

>> select address from customerinfo limit 2; 
address 
---------------- 
Brahegatan 50 
Envägen 12

However, when I query the data using Hive View in Ambari, the result are not display correctly.

select address from customerinfo limit 2; 
address 
---------------- 
Brahegatan 50 
Env?gen 12

Similar issue happens when querying with Zeppelin using SparkQL.

Is this a bug in Hive view or Ambari? Do you know what is the cause of the problem and the solution?

BR,

/Nhan

jsensharma · ‎05-05-2017

@Nhan Nguyen

Which version of ambari are you using?

Some older version of ambari (prior to Ambari 2.4) has some issue with multibyte characters that till cause such issues:

https://issues.apache.org/jira/browse/AMBARI-16713

nhan_nguyen · ‎05-05-2017

I am using HDP 2.6 with Ambari 2.5.

We have such problem both in Zeppelin and Hive-view.

jsensharma · ‎05-05-2017

@Nhan Nguyen In HDP2.6 Ambari2.5 i can reproduce the issue.

Looks like the issue is from Hive Side: https://issues.apache.org/jira/browse/HIVE-15927

Example:

[root@sandbox hive-next-view]# su - hive
[hive@sandbox ~]$ beeline
Beeline version 1.2.1000.2.6.0.3-8 by Apache Hive
beeline> !connect jdbc:hive2://sandbox.hortonworks.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
Connecting to jdbc:hive2://sandbox.hortonworks.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
Enter username for jdbc:hive2://sandbox.hortonworks.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2: hive
Enter password for jdbc:hive2://sandbox.hortonworks.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2: ****
Connected to: Apache Hive (version 1.2.1000.2.6.0.3-8)
Driver: Hive JDBC (version 1.2.1000.2.6.0.3-8)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://sandbox.hortonworks.com:2181/> SELECT * FROM customerinfo;
+-----------------------+--+
| customerinfo.address  |
+-----------------------+--+
| Env�gen               |
+-----------------------+--+
1 row selected (0.255 seconds)

.

umair_khan · ‎05-05-2017

This is not a Hive issue rather a file system or file encoding issue. SELECT * in Hive actually does nothing except read the file from file system. So if you run a hadoop fs cat on your underlying file, you should see the same behavior.

You can check file encoding on bash as $ file -i filename

You can change the encoding using iconv. And convert to utf-8 which is printable encoding.

iconv -f current_encoding -t new_encoding input.file -o out.file

nhan_nguyen · ‎05-08-2017

We have checked and the encoding for the file is UTF-8. Please note that if we use hive command line interface to SELECT *, the text display correctly in the terminal. But in Hive View and Zeppelin, the text display wrong.

So, it might not be an issue of Hive but Hive view/Zeppelin.

srai1 · ‎05-06-2017

@Nhan Nguyen

This seems more like a combination of encoding of the source/input file. So like @Jay SenSharma mention

0: jdbc:hive2://xlnode-2.h.c:2181,xlnode-3.h.> select * from abc_orc;
+---------------+--+
| abc_orc.col1  |
+---------------+--+
| Env�gen       |
+---------------+--+

We can check the file format of this file

-bash-4.1$ hdfs dfs -get /apps/hive/warehouse/abc/000000_0 .
-bash-4.1$ file 000000_0 
000000_0: ISO-8859 text

and like @Umair Khan stated, if we convert the encode, we can see the file accordingly

-bash-4.1$ iconv -f ISO-8859-1 -t UTF-8//TRANSLIT 000000_0 -o 000000_1
-bash-4.1$ file 000000_1
000000_1: UTF-8 Unicode text
-bash-4.1$ 
-bash-4.1$ 
-bash-4.1$ hdfs dfs -put 000000_1 /apps/hive/warehouse/abc/
-bash-4.1$ beeline -u "jdbc:hive2://xlnode-2.h.c:2181,xlnode-3.h.c:2181,xlnode-1.h.c:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" -n hive -p ''
Connecting to jdbc:hive2://xlnode-2.h.c:2181,xlnode-3.h.c:2181,xlnode-1.h.c:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
Connected to: Apache Hive (version 1.2.1000.2.6.0.3-8)
Driver: Hive JDBC (version 1.2.1000.2.6.0.3-8)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.6.0.3-8 by Apache Hive
0: jdbc:hive2://xlnode-2.h.c:2181,xlnode-3.h.> select * from abc;
+-----------+--+
| abc.col1  |
+-----------+--+
| Env�gen   |
| Envägen   |
+-----------+--+
2 rows selected (0.26 seconds)
0: jdbc:hive2://xlnode-2.h.c:2181,xlnode-3.h.>

Can you try using a different browser, or if you are using chrome, can enable supporting all the encodings !! see if that works

nhan_nguyen · ‎05-08-2017

We have checked and the encoding for the file is UTF-8. Please note that if we use hive command line interface to SELECT *, the text display correctly in the terminal. But in Hive View and Zeppelin, the text display wrong.

I have also tested with different browser and different encodings, but Hive view just cannot rendered if not using Unicode UTF-8.

nhan_nguyen · ‎05-08-2017

I have made a test. I create a text file file content in Unicode UTF-8:

> cat test.csv
björn,alvägen
> file test.csv
test.csv: UTF-8 Unicode text

And create a table reading from that csv file;

hive> CREATE EXTERNAL TABLE test (
    >   column1 String,
    >   column2 String
    > )
    > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
    > LOCATION '/user/myuser/test/';
hive> select * from test;
OK
björn   alvägen
Time taken: 0.058 seconds, Fetched: 1 row(s)

But in HIVE VIEW:

test.column1  	test.column2

bj?rn		alv?gen

Similar issue happens for Zeppelin.

So, it is not a problem with encoding of source file @Jay SenSharma, @Umair Khan @Shyam Sunder Rai as I have source file in URF-8 and Hive can query data and display it in correct UTF-8. It must be an issue with HIVE VIEW and Zeppelin

Cloudera Community

Support Questions

Hive-View does not display correctly UTF-8 characters