- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Special characters are displayed as question mark (?).
- Labels:
-
Apache Hive
Created 08-28-2023 01:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have a delimited flatfile which has special characters ingested into HDFS. We have created a Hive External table on top of the HDFS path and visualized the data in Hue & Beeline. But the special characters are displayed as question mark (?). Please advise the solution.
Created 08-31-2023 03:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
we tried replicating the issue with the data shared by @Shivakuk . Left/Right Single/Double Quotation Mark(smart quotes) in the text did not show up correctly and got converted to ? . I was able to fix this issue by changing the LC_CTYPE from "UTF-8" to "en_US.UTF-8".
Check "locale" command output:
# locale
LANG=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
See what your LC_CTYPE read.
If it does not read "en_US.UTF-8", do the following:
vi ~/.bash_profile
Add the following two lines at the bottom:
+++
LC_CTYPE=en_US.UTF-8
export LC_CTYPE
+++
Save the file, and source it for it to take effect:
#source ~/.bash_profile
Now connect to beeline, and see if the data show up correctly.
Created 08-28-2023 05:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Shivakuk Did you use Ranger masking for that column?
Provide the below output
show create table <tablename>
Created 08-31-2023 03:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
we tried replicating the issue with the data shared by @Shivakuk . Left/Right Single/Double Quotation Mark(smart quotes) in the text did not show up correctly and got converted to ? . I was able to fix this issue by changing the LC_CTYPE from "UTF-8" to "en_US.UTF-8".
Check "locale" command output:
# locale
LANG=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
See what your LC_CTYPE read.
If it does not read "en_US.UTF-8", do the following:
vi ~/.bash_profile
Add the following two lines at the bottom:
+++
LC_CTYPE=en_US.UTF-8
export LC_CTYPE
+++
Save the file, and source it for it to take effect:
#source ~/.bash_profile
Now connect to beeline, and see if the data show up correctly.
Created 09-04-2023 10:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Shivakuk, Have any of the replies helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
Regards,
Vidya Sargur,Community Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
