Support Questions

mohamed_emadald · ‎06-20-2017

I have python script, I want to parse json -contains Arabic words -but it doesn't support utf-8 encoding.

I got below error.

My script

import json

import java.io

from org.apache.commons.io import IOUtils

from java.nio.charset import StandardCharsets

from org.apache.nifi.processor.io import StreamCallback

class ModJSON(StreamCallback):

def __init__(self):

pass

def process(self, inputStream, outputStream):

text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)

obj = json.loads(text) insertquery = "insert into Tweets_test values ('"+str(obj['id'])+"','"+obj['text'].encode('utf-8')+"','"+str(obj['id_str'])+"');"

outputStream.write(bytearray(insertquery))

flowFile = session.get()

if (flowFile != None):

flowFile = session.write(flowFile, ModJSON())

session.transfer(flowFile, REL_SUCCESS)

session.commit()

mburgess · ‎06-20-2017

I use bytearray() in my examples, but I haven't been able to figure out when you need it and when you don't. I suspect it might be when the type is 'unicode' or 'java.lang.String' instead of Jython's 'str' type. The following two lines worked for me:

insertquery = "insert into Tweets_test values ('"+str(obj['id'])+"','"+obj['text']+"','"+str(obj['id_str'])+"');"
outputStream.write(insertquery)

This page says that a Jython String will be coerced to byte[] when necessary, and that seems to be what's going on above.

mohamed_emadald · ‎06-21-2017

Thanks Matt,

I got above error when json contain Arabic words in text.

mburgess · ‎06-21-2017

I tested this with Arabic characters in my text field, and it worked fine. You're saying you still get the error when using my suggested lines?

Cloudera Community

Support Questions

Execute script processor don't support utf-8 encoding