I have python script, I want to parse json -contains Arabic words -but it doesn't support utf-8 encoding.
I got below error.
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback
def process(self, inputStream, outputStream):
text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
obj = json.loads(text) insertquery = "insert into Tweets_test values ('"+str(obj['id'])+"','"+obj['text'].encode('utf-8')+"','"+str(obj['id_str'])+"');"
flowFile = session.get()
if (flowFile != None):
flowFile = session.write(flowFile, ModJSON())
I use bytearray() in my examples, but I haven't been able to figure out when you need it and when you don't. I suspect it might be when the type is 'unicode' or 'java.lang.String' instead of Jython's 'str' type. The following two lines worked for me:
insertquery = "insert into Tweets_test values ('"+str(obj['id'])+"','"+obj['text']+"','"+str(obj['id_str'])+"');" outputStream.write(insertquery)
This page says that a Jython String will be coerced to byte when necessary, and that seems to be what's going on above.
I tested this with Arabic characters in my text field, and it worked fine. You're saying you still get the error when using my suggested lines?