Currently I have a dataflow with the GetFile processor that taps into a directory path with TSV files. I want to convert these TSV files to CSV for later work using the ConvertCSVToAvro processor. I've created this python script with a .bash wrapper to test it:
import sys
import csv
tsvin = csv.reader(sys.stdin, dialect=csv.excel_tab)
commaout = csv.writer(sys.stdout, dialect=csv.excel)
for row in tsvin:
commaout.writerow(row)
bash wrapper
for file in *.tsv
do
python tsv2csv.py < $file > ${file%.*}.csv
done
I see the ExecuteScript processor as a possible option. How would I use it to execute this python script--would the processor know where to import from for example...or is there a better way to convert?