Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

hive doesnt display special charactere from writestream

avatar
Contributor

Hello,

 

I'm facing an issue with the display and storage of special charactere in hive.

I'm using spark for doing a WriteStream like this in Hive,

 

 

// Write result in hive
    val query = trimmedDF.writeStream
      //.format("console")
      .format("com.hortonworks.spark.sql.hive.llap.streaming.HiveStreamingDataSource")
      .outputMode("append")
      .option("metastoreUri", metastoreUri)
      .option("database", "dwh_prod")
      .option("table", "res_idos_0")
      .option("checkpointLocation", "/tmp/idos_LVD_060420_0")
      .queryName("test_final")
      .option("truncate", "false")
      .option("encoding", "UTF-8")
      .start()

    query.awaitTermination()

 

 

 but when I have a special charactere Hive doesn't display it. I have already fixe encoding UTF8 in the hive table :

 

 

select distinct(analyte) from res_idos_0;
+--------------------------------------------+
|                  analyte                   |
+--------------------------------------------+
| D02                                        |
| E                                          |
| E - Hauteur Int��rieure jupe - 6,75mm      |
| Hauteur totale                             |
| Long tube apparent (embout 408 assembl��)  |
| Side streaming - poids apr��s              |
| Tenue tube plongeur                        |
| 1 dose - poids avant                       |
| Diam��tre 1er joint de sertissage          |
| HDS - Saillie Point Mort Bas               |
| P - Epaisseur tourette P5 - 0,51mm         |
+--------------------------------------------+

 

 

 

But if I display the data in console with writeStream the special chararacter are correctly display or if I use write fonction for write in hive like this:

 

 

final_DF.write.format("com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector")
      .mode("overwrite")
      .option("table","dwh_prod.result_idos_lims3")
      .save()

 

 

The charactere are correctly display in hive

 

 

+-------------------------------------------+
|                  analyte                  |
+-------------------------------------------+
| 1 dose                                    |
| 1 dose (moyenne) - Kinf                   |
| 1 dose (écart type)                       |
| 1 dose - poids avant                      |
| 1 dose individuelle (maxi)                |
| 1,00mm                                    |
| 1,3,5-trioxane                            |

 

 

I use spark 2.3.2 an hive 3.1.0

Those anyone face this issue or have clue or a solution for me.

 

Thanks in advance,

Best Regards

1 ACCEPTED SOLUTION

avatar
Contributor

Hi,

after some researches i have find a solution to this issues. The problem was from the Hive table definition for storing data.

I was defining some properties of my table like this :

hive.createTable("res_idos_0")
        .ifNotExists()
        .prop("serialization.encoding","UTF-8")
        .prop("escape.delim" , "\t")
        .column("t_date","TIMESTAMP")

 

But when we are in writeStream and we use special characters, the use of property escape.delim is note supported and we can't save characters correctly.

 

So, i have removed the property escape.delim in my hive table definition and i had also added this line in my code for being certain that file save in HDFS have the right encoding.

System.setProperty("file.encoding", "UTF-8")

 

 

View solution in original post

1 REPLY 1

avatar
Contributor

Hi,

after some researches i have find a solution to this issues. The problem was from the Hive table definition for storing data.

I was defining some properties of my table like this :

hive.createTable("res_idos_0")
        .ifNotExists()
        .prop("serialization.encoding","UTF-8")
        .prop("escape.delim" , "\t")
        .column("t_date","TIMESTAMP")

 

But when we are in writeStream and we use special characters, the use of property escape.delim is note supported and we can't save characters correctly.

 

So, i have removed the property escape.delim in my hive table definition and i had also added this line in my code for being certain that file save in HDFS have the right encoding.

System.setProperty("file.encoding", "UTF-8")