Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

manipulate CSV flowfiles with ExecuteScript python

avatar
Rising Star

Hi,

I got this problem

I need to do some cleaning within columns in my flowfile. the columns are seperated by tabs, sometimes there can be \n within the columns so i cannot use splitlines to access the data, so i tried to use the csv library to read the inputstream.

But i get an error :

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_77] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77] Caused by: javax.script.ScriptException: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Cannot create PyString with non-byte value in <script> at line number 21

Here are my script which i am running in the ExecuteScript body

from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback
import csv 
 
# Define a subclass of StreamCallback for use in session.write()
class PyStreamCallback(StreamCallback):
  def __init__(self):
        pass
  def process(self, inputStream, outputStream):
	newText =''
	Text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
	reader = csv.reader(Text,delimiter='	')
	for row in reader:
		newText+=('	'.join(row)).rstrip('\n\r')
		
	outputStream.write(newText.encode('utf-8'))
# end class
flowFile = session.get()
if(flowFile != None):
	flowFile = session.write(flowFile, PyStreamCallback())
	session.transfer(flowFile, REL_SUCCESS)
# implicit return at the end
1 REPLY 1

avatar
Super Guru

Can you provide some sample input? I tried with a tab-separated file that contained a \n in the column (with the line ending in \n\r), and your script worked fine. I tried replacing the delimiter value with \t instead of an actual tab character, and it seemed to work fine too.

Labels