- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
manipulate CSV flowfiles with ExecuteScript python
- Labels:
-
Apache NiFi
Created ‎05-03-2017 11:36 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I got this problem
I need to do some cleaning within columns in my flowfile. the columns are seperated by tabs, sometimes there can be \n within the columns so i cannot use splitlines to access the data, so i tried to use the csv library to read the inputstream.
But i get an error :
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_77] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77] Caused by: javax.script.ScriptException: java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: Cannot create PyString with non-byte value in <script> at line number 21
Here are my script which i am running in the ExecuteScript body
from org.apache.commons.io import IOUtils from java.nio.charset import StandardCharsets from org.apache.nifi.processor.io import StreamCallback import csv # Define a subclass of StreamCallback for use in session.write() class PyStreamCallback(StreamCallback): def __init__(self): pass def process(self, inputStream, outputStream): newText ='' Text = IOUtils.toString(inputStream, StandardCharsets.UTF_8) reader = csv.reader(Text,delimiter=' ') for row in reader: newText+=(' '.join(row)).rstrip('\n\r') outputStream.write(newText.encode('utf-8')) # end class flowFile = session.get() if(flowFile != None): flowFile = session.write(flowFile, PyStreamCallback()) session.transfer(flowFile, REL_SUCCESS) # implicit return at the end
Created ‎05-03-2017 01:15 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you provide some sample input? I tried with a tab-separated file that contained a \n in the column (with the line ending in \n\r), and your script worked fine. I tried replacing the delimiter value with \t instead of an actual tab character, and it seemed to work fine too.
