Support Questions

Find answers, ask questions, and share your expertise
Welcome to the upgraded Community! Read this blog to see What’s New!

manipulate CSV flowfiles with ExecuteScript python

Rising Star


I got this problem

I need to do some cleaning within columns in my flowfile. the columns are seperated by tabs, sometimes there can be \n within the columns so i cannot use splitlines to access the data, so i tried to use the csv library to read the inputstream.

But i get an error :

at java.util.concurrent.ThreadPoolExecutor$ [na:1.8.0_77] at [na:1.8.0_77] Caused by: javax.script.ScriptException: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Cannot create PyString with non-byte value in <script> at line number 21

Here are my script which i am running in the ExecuteScript body

from import IOUtils
from java.nio.charset import StandardCharsets
from import StreamCallback
import csv 
# Define a subclass of StreamCallback for use in session.write()
class PyStreamCallback(StreamCallback):
  def __init__(self):
  def process(self, inputStream, outputStream):
	newText =''
	Text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
	reader = csv.reader(Text,delimiter='	')
	for row in reader:
		newText+=('	'.join(row)).rstrip('\n\r')
# end class
flowFile = session.get()
if(flowFile != None):
	flowFile = session.write(flowFile, PyStreamCallback())
	session.transfer(flowFile, REL_SUCCESS)
# implicit return at the end

Super Guru

Can you provide some sample input? I tried with a tab-separated file that contained a \n in the column (with the line ending in \n\r), and your script worked fine. I tried replacing the delimiter value with \t instead of an actual tab character, and it seemed to work fine too.