Member since
03-23-2017
2
Posts
0
Kudos Received
0
Solutions
03-24-2017
10:45 AM
Hello! Thanks for your reply. I've tested with external file at hdfs and it works fine. Thanks. But could somebody explain me why insert fails with cyrillic symbols? I've created small script in python with reads file.csv and runs insert against hive table. Also i've uploaded that file.csv to hdfs and created external table from it. So the results are different. As for external table hive works fine - shows me correct cyrillic symbols. But values which were inserted by python from same file are incorrect. So. Is that the only way to use cyrillic? Should i just write files to hdfs and use external table? Why insert doesn't work? Thanks!
... View more
03-23-2017
06:15 PM
Have a problem with cyrillic symbols at hive tables. Installed versions: ambari-server 2.4.2.0-136
hive-2-5-3-0-37 1.2.1000.2.5.3.0-37
Ubuntu 14.04 Whats the problem:
Set locale to ru_RU.UTF-8: spark@hadoop:~$ locale
LANG=ru_RU.UTF-8
LANGUAGE=ru_RU:ru
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=ru_RU.UTF-8
Connect to hive and create test table: spark@hadoop:~$ beeline -n spark -u jdbc:hive2://spark@hadoop.domain.com:10000/
Connecting to enter code herejdbc:hive2://spark@hadoop.domain.com:10000/
Connected to: Apache Hive (version 1.2.1000.2.5.3.0-37)
Driver: Hive JDBC (version 1.2.1000.2.5.3.0-37)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.5.3.0-37 by Apache Hive
0: jdbc:hive2://spark@hadoop.domain.com> CREATE TABLE `test`(`name` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( 'serialization.encoding'='UTF-8');
No rows affected (0,127 seconds)
Insert cyrillic symbols: 0: jdbc:hive2://spark@hadoop.domain.com> insert into test values('привет');
INFO : Tez session hasn't been created yet. Opening session
INFO : Dag name: insert into test values('привет')(Stage-1)
INFO :
INFO : Status: Running (Executing on YARN cluster with App id application_1490211406894_2481)
INFO : Map 1: -/-
INFO : Map 1: 0/1
INFO : Map 1: 0(+1)/1
INFO : Map 1: 1/1
INFO : Loading data to table default.test from hdfs://hadoop.domain.com:8020/apps/hive/warehouse/test/.hive-staging_hive_2017-03-23_13-41-46_215_3133047104896717605-116/-ext-10000
INFO : Table default.test stats: [numFiles=1, numRows=1, totalSize=7, rawDataSize=6]
No rows affected (6,652 seconds)
Select from table: 0: jdbc:hive2://spark@hadoop.domain.com> select * from test;
+------------+--+
| test.name |
+------------+--+
| ?@825B |
+------------+--+
1 row selected (0,162 seconds)
I've read a lot of bugs at apache hive, tested unicode, utf-8, utf-16, some isos encodings with no luck. Can somebody help me with that? Thanks!
... View more
Labels:
- Labels:
-
Apache Ambari