- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Execute script processor don't support utf-8 encoding
- Labels:
-
Apache NiFi
Created on ‎06-20-2017 10:27 AM - edited ‎08-17-2019 07:09 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have python script, I want to parse json -contains Arabic words -but it doesn't support utf-8 encoding.
I got below error.
My script
import json
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback
class ModJSON(StreamCallback):
def __init__(self):
pass
def process(self, inputStream, outputStream):
text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
obj = json.loads(text) insertquery = "insert into Tweets_test values ('"+str(obj['id'])+"','"+obj['text'].encode('utf-8')+"','"+str(obj['id_str'])+"');"
outputStream.write(bytearray(insertquery))
flowFile = session.get()
if (flowFile != None):
flowFile = session.write(flowFile, ModJSON())
session.transfer(flowFile, REL_SUCCESS)
session.commit()
Created ‎06-20-2017 05:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I use bytearray() in my examples, but I haven't been able to figure out when you need it and when you don't. I suspect it might be when the type is 'unicode' or 'java.lang.String' instead of Jython's 'str' type. The following two lines worked for me:
insertquery = "insert into Tweets_test values ('"+str(obj['id'])+"','"+obj['text']+"','"+str(obj['id_str'])+"');" outputStream.write(insertquery)
This page says that a Jython String will be coerced to byte[] when necessary, and that seems to be what's going on above.
Created ‎06-21-2017 09:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Matt,
I got above error when json contain Arabic words in text.
Created ‎06-21-2017 06:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tested this with Arabic characters in my text field, and it worked fine. You're saying you still get the error when using my suggested lines?
