Support Questions

Find answers, ask questions, and share your expertise
Announcements
Now Live: Explore expert insights and technical deep dives on the new Cloudera Community BlogsRead the Announcement

gbk转utf8方案

avatar

请问,在cdp7.17 环境,执行gbk转utf8报错,如下图所示,要如何解决?

zhouweibo_1-1765940703713.jpeg

 

zhouweibo_0-1765940030470.jpeg

 

2 REPLIES 2

avatar
Community Manager

English Translation:

GBK to UTF-8 conversion solution

In a CDP 7.17 environment, I encountered an error when converting GBK to UTF-8, as shown in the image below. How can I resolve this?

@zhouweibo, Welcome to our community! To help you get the best possible answer, I have tagged in our CDP @zzeng @upadhyayk04  @rki_  experts who may be able to assist you further.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Expert Contributor

@zhouweibo  Hi Weibo,

I build an environment by myself, but I can't reproduce your error.

I created a table in this way:

-- Create database if not exists
CREATE DATABASE IF NOT EXISTS upidb;

-- Create table with correct syntax
CREATE TABLE IF NOT EXISTS upidb.gscs_tbl_fultrans_2_db (
    trace_num STRING COMMENT 'Trace number',
    acq_ins_cde STRING COMMENT 'Acquiring institution code',
    fwd_ins_cde STRING COMMENT 'Forwarding institution code',
    acq_trans_cde STRING COMMENT 'Acquiring transaction code',
    iss_trans_cde STRING COMMENT 'Issuing transaction code',
    pri_acct_num STRING COMMENT 'Primary account number',
    resv_fld2 STRING COMMENT 'Reserved field 2 - contains encoded data',
    sett_dt STRING COMMENT 'Settlement date in YYYYMMDD format'
)
COMMENT 'Full transaction table for settlement date partition'
STORED AS PARQUET;

-- Insert sample data with GBK encoded characters to reproduce the encoding issue
INSERT INTO upidb.gscs_tbl_fultrans_2_db VALUES
(
    '123456789012',
    'ACQ001',
    'FWD001',
    'TRANS001',
    'TRANS002',
    '1234567890123456',
    'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789测试中文字符ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ',
    '20231201'
);

Then ran the SELECT sql, but Can't reproduce error:

SELECT trace_num
      ,acq_ins_cde
      ,fwd_ins_cde
      ,acq_trans_cde
      ,iss_trans_cde
      ,pri_acct_num
      ,trim(SUBSTR(resv_fld2,111,2))
      ,trim(decode(SUBSTR(encode(resv_fld2,'gbk'),113,15),'gbk'))
      ,trim(decode(SUBSTR(encode(resv_fld2,'gbk'),128,40),'gbk'))
      ,trim(decode(SUBSTR(encode(resv_fld2,'gbk'),168,40),'gbk'))
from upidb.gscs_tbl_fultrans_2_db
where sett_dt = '20231201'

zzeng_0-1765973674946.png

Can you please share your DDL?