Support Questions
Find answers, ask questions, and share your expertise

Encoding Python script in NiFi behaving differently than in console

Encoding Python script in NiFi behaving differently than in console

New Contributor

Hello all, I am having some issues with encodings in flow files and I'm truly stumped. I have some flow files that have invalid UTF-8 characters present and when I attempt to read them in to my Python script as 

 

flow_file = sys.stdin.read()

 

it fails with the error "'utf-8' codec can't decode byte 0xd1 in position 25729: invalid continuation byte". When I grabbed the full content of the flow file and ran it in my local Python console, it appeared to fix the issue. Specifically, 

 

import unidecode
string = """PIÑEDA""" # example string, not content of flow file
unidecode.unidecode(string)
Out[28]: 'PINEDA'

 

works just fine. But, when I attempt this in an ExecuteStreamCommand Python script

 

#!/usr/bin/python3

import sys
import unidecode

try:
    flow_file = sys.stdin.read()
    sys.stdout.write(flow_file)
except UnicodeDecodeError as e:
    flow_file = unidecode.unidecode(sys.stdin.read())
    sys.stdout.write(flow_file)

 

 it produces no error, but returns a 0 byte file in the output stream. 

Why is that the code works differently on my local vs in NiFi, and how do I fix this?