Created on 11-23-2015 02:29 AM - edited 09-16-2022 02:50 AM
Hi, we have an issue here involving an encoding problem with the saved data within the HBase instance from our cloudera cluster. Data from HBase are stored in UTF8, an external python process is invoked to to read that set of data and it seems that the method json.dumps changed this json into a unicode one and thats the way how it is being look in a webbrowser and the application which receives the json. We are not using any argument to json.dumps method, because it is the way to use a default encoding, UTF-8.
When i specify an UTF8 encoding when i invoke the json.dumps method, i can see in the web browser a set of weird characters.
This is part of the python code. I am creating a list of "movimientos" reading from HBase, the data loaded from HBase are in UTF-8, but if i invoke print json.dumps(movimientos) method without any parameter, i can see the data in unicode. If i invoke this line, print movimientos, i can see the data in UTF-8!
This is an entry from a row in HBase:
mov:ec72f5c5-961c-4ff5-bdaf-f958b1687acb timestamp=1447937758934, value=type:"CARDS"|date:2015-07-21|amount:99.0|categoryId:6|categoryDescription:"ELECTR\xC3\x93NICADE CONSUMO-ELECTRODOM\xC3\x89STICO-MEN.HOGAR"|customCategoryId:|customCategoryDescription:|cardNumber:4511635231378008|clie
ntId:101|cardType:"Credit"|transactionType:Compra|entityCode:7010|merchantCode:I5722|accountingDate:2015-07-21|amountOtherCu
rrency:0.0|currency:"EUR"|channel:Tpv|movementDescription:"ACHICA.ES"|uuid:"ec72f5c5-961c-4ff5-bdaf-f958b1687acb"|textDescri
ption:"ELECTRODOM\xC3\x89STICOS EQUIPOS EL\xC3\x89CTRICOS M\xC3\x81QUINAS ESPECIALES ANTENAS PARAB\xC3\x93LICAS AIRE ACO
NDICIONADO Y RECEPCI\xC3\x93N CANALES TV."
You can see the UTF characters...
...
movimientos = []
...
for uid in uids_cards:
movimiento = getRowsHBase(ip,port,table,rowKey,column_family+uid)
movimientos.append(movimiento)
for uid in uids_incomes:
movimiento = getRowsHBase(ip,port,table,rowKey,column_family+uid)
movimientos.append(movimiento)
for uid in uids_receipts:
movimiento = getRowsHBase(ip,port,table,rowKey,column_family+uid)
movimientos.append(movimiento)
if len(movimientos)==0:
output['error'] = 1
output['mensaje'] = 'No hay datos para la consulta'
print json.dumps(output)
sys.exit()
print json.dumps(movimientos)
...
Created 12-06-2015 11:35 PM
Created 12-06-2015 11:35 PM