Hello,
I'm facing an issue with the display and storage of special charactere in hive.
I'm using spark for doing a WriteStream like this in Hive,
// Write result in hive
val query = trimmedDF.writeStream
//.format("console")
.format("com.hortonworks.spark.sql.hive.llap.streaming.HiveStreamingDataSource")
.outputMode("append")
.option("metastoreUri", metastoreUri)
.option("database", "dwh_prod")
.option("table", "res_idos_0")
.option("checkpointLocation", "/tmp/idos_LVD_060420_0")
.queryName("test_final")
.option("truncate", "false")
.option("encoding", "UTF-8")
.start()
query.awaitTermination()
but when I have a special charactere Hive doesn't display it. I have already fixe encoding UTF8 in the hive table :
select distinct(analyte) from res_idos_0;
+--------------------------------------------+
| analyte |
+--------------------------------------------+
| D02 |
| E |
| E - Hauteur Int��rieure jupe - 6,75mm |
| Hauteur totale |
| Long tube apparent (embout 408 assembl��) |
| Side streaming - poids apr��s |
| Tenue tube plongeur |
| 1 dose - poids avant |
| Diam��tre 1er joint de sertissage |
| HDS - Saillie Point Mort Bas |
| P - Epaisseur tourette P5 - 0,51mm |
+--------------------------------------------+
But if I display the data in console with writeStream the special chararacter are correctly display or if I use write fonction for write in hive like this:
final_DF.write.format("com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector")
.mode("overwrite")
.option("table","dwh_prod.result_idos_lims3")
.save()
The charactere are correctly display in hive
+-------------------------------------------+
| analyte |
+-------------------------------------------+
| 1 dose |
| 1 dose (moyenne) - Kinf |
| 1 dose (écart type) |
| 1 dose - poids avant |
| 1 dose individuelle (maxi) |
| 1,00mm |
| 1,3,5-trioxane |
I use spark 2.3.2 an hive 3.1.0
Those anyone face this issue or have clue or a solution for me.
Thanks in advance,
Best Regards
Created 04-08-2020 06:13 AM
Hi,
after some researches i have find a solution to this issues. The problem was from the Hive table definition for storing data.
I was defining some properties of my table like this :
hive.createTable("res_idos_0")
.ifNotExists()
.prop("serialization.encoding","UTF-8")
.prop("escape.delim" , "\t")
.column("t_date","TIMESTAMP")
But when we are in writeStream and we use special characters, the use of property escape.delim is note supported and we can't save characters correctly.
So, i have removed the property escape.delim in my hive table definition and i had also added this line in my code for being certain that file save in HDFS have the right encoding.
System.setProperty("file.encoding", "UTF-8")
Created 04-08-2020 06:13 AM
Hi,
after some researches i have find a solution to this issues. The problem was from the Hive table definition for storing data.
I was defining some properties of my table like this :
hive.createTable("res_idos_0")
.ifNotExists()
.prop("serialization.encoding","UTF-8")
.prop("escape.delim" , "\t")
.column("t_date","TIMESTAMP")
But when we are in writeStream and we use special characters, the use of property escape.delim is note supported and we can't save characters correctly.
So, i have removed the property escape.delim in my hive table definition and i had also added this line in my code for being certain that file save in HDFS have the right encoding.
System.setProperty("file.encoding", "UTF-8")