Created on 06-20-2017 10:27 AM - edited 08-17-2019 07:09 PM
I have python script, I want to parse json -contains Arabic words -but it doesn't support utf-8 encoding.
I got below error.
My script
import json
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback
class ModJSON(StreamCallback):
def __init__(self):
pass
def process(self, inputStream, outputStream):
text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
obj = json.loads(text) insertquery = "insert into Tweets_test values ('"+str(obj['id'])+"','"+obj['text'].encode('utf-8')+"','"+str(obj['id_str'])+"');"
outputStream.write(bytearray(insertquery))
flowFile = session.get()
if (flowFile != None):
flowFile = session.write(flowFile, ModJSON())
session.transfer(flowFile, REL_SUCCESS)
session.commit()
Created 06-20-2017 05:30 PM
I use bytearray() in my examples, but I haven't been able to figure out when you need it and when you don't. I suspect it might be when the type is 'unicode' or 'java.lang.String' instead of Jython's 'str' type. The following two lines worked for me:
insertquery = "insert into Tweets_test values ('"+str(obj['id'])+"','"+obj['text']+"','"+str(obj['id_str'])+"');" outputStream.write(insertquery)
This page says that a Jython String will be coerced to byte[] when necessary, and that seems to be what's going on above.
Created 06-21-2017 09:38 AM
Thanks Matt,
I got above error when json contain Arabic words in text.
Created 06-21-2017 06:24 PM
I tested this with Arabic characters in my text field, and it worked fine. You're saying you still get the error when using my suggested lines?