Support Questions

Find answers, ask questions, and share your expertise

Execute script processor don't support utf-8 encoding

avatar

I have python script, I want to parse json -contains Arabic words -but it doesn't support utf-8 encoding.

I got below error.

My script

import json

import java.io

from org.apache.commons.io import IOUtils

from java.nio.charset import StandardCharsets

from org.apache.nifi.processor.io import StreamCallback

class ModJSON(StreamCallback):

def __init__(self):

pass

def process(self, inputStream, outputStream):

text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)

obj = json.loads(text) insertquery = "insert into Tweets_test values ('"+str(obj['id'])+"','"+obj['text'].encode('utf-8')+"','"+str(obj['id_str'])+"');"

outputStream.write(bytearray(insertquery))

flowFile = session.get()

if (flowFile != None):

flowFile = session.write(flowFile, ModJSON())

session.transfer(flowFile, REL_SUCCESS)

session.commit()

16504-excutescript.png

3 REPLIES 3

avatar
Master Guru

I use bytearray() in my examples, but I haven't been able to figure out when you need it and when you don't. I suspect it might be when the type is 'unicode' or 'java.lang.String' instead of Jython's 'str' type. The following two lines worked for me:

insertquery = "insert into Tweets_test values ('"+str(obj['id'])+"','"+obj['text']+"','"+str(obj['id_str'])+"');"
outputStream.write(insertquery)

This page says that a Jython String will be coerced to byte[] when necessary, and that seems to be what's going on above.

avatar

Thanks Matt,

I got above error when json contain Arabic words in text.

avatar
Master Guru

I tested this with Arabic characters in my text field, and it worked fine. You're saying you still get the error when using my suggested lines?