Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

manipulate CSV flowfiles with ExecuteScript python

avatar
Expert Contributor

Hi,

I got this problem

I need to do some cleaning within columns in my flowfile. the columns are seperated by tabs, sometimes there can be \n within the columns so i cannot use splitlines to access the data, so i tried to use the csv library to read the inputstream.

But i get an error :

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_77] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77] Caused by: javax.script.ScriptException: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Cannot create PyString with non-byte value in <script> at line number 21

Here are my script which i am running in the ExecuteScript body

from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback
import csv 
 
# Define a subclass of StreamCallback for use in session.write()
class PyStreamCallback(StreamCallback):
  def __init__(self):
        pass
  def process(self, inputStream, outputStream):
	newText =''
	Text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
	reader = csv.reader(Text,delimiter='	')
	for row in reader:
		newText+=('	'.join(row)).rstrip('\n\r')
		
	outputStream.write(newText.encode('utf-8'))
# end class
flowFile = session.get()
if(flowFile != None):
	flowFile = session.write(flowFile, PyStreamCallback())
	session.transfer(flowFile, REL_SUCCESS)
# implicit return at the end
1 REPLY 1

avatar
Master Guru

Can you provide some sample input? I tried with a tab-separated file that contained a \n in the column (with the line ending in \n\r), and your script worked fine. I tried replacing the delimiter value with \t instead of an actual tab character, and it seemed to work fine too.