Created 08-02-2016 12:41 AM
Hi,
I wrote an ExcuteScript Processor following the example to convert a json string to a csv string. I pick Python and it works well until it hits special characters like 'í'. I wonder if I can use some other encoding instead of UTF_8 to do 'text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)' and 'outputStream.write(bytearray(msg_csv.encode('utf-8')))'?
Error: UnicodeEncodeError: 'ascii' codec can't encode character u'\xed' in position 14: ordinal not in range(128) in <script> at line number 52
Thanks a lot for the help!
Stephanie
Created 08-02-2016 01:52 AM
Yes, ExecuteScript has access to all of StandardCharsets (either by its static inner classes or by name, as you mention), and in Jython you should have access to all of its charsets too. Do you know the encoding of the incoming flow file? If it is variable yet available as an attribute (perhaps as part of the mime.type attribute), you can try passing in that value to IOUtils.toString() and/or msg_csv.encode(), using flowFile.getAttribute('mime.type') and parsing off the parameter (MIME type params are delimited after the type with semicolons, I think the param name is 'charset').
In your case you might just try StandardCharsets.UTF_16 and/or msg_csv.encode('utf-16') to see if that fixes it.
Created 08-02-2016 06:24 PM
Hi Matt,
Thanks for the quick reply! I will try it out!