Created 04-04-2018 04:13 PM
Currently I have a dataflow with the GetFile processor that taps into a directory path with TSV files. I want to convert these TSV files to CSV for later work using the ConvertCSVToAvro processor. I've created this python script with a .bash wrapper to test it:
import sys import csv tsvin = csv.reader(sys.stdin, dialect=csv.excel_tab) commaout = csv.writer(sys.stdout, dialect=csv.excel) for row in tsvin: commaout.writerow(row)
bash wrapper
for file in *.tsv do python tsv2csv.py < $file > ${file%.*}.csv done
I see the ExecuteScript processor as a possible option. How would I use it to execute this python script--would the processor know where to import from for example...or is there a better way to convert?
Created 04-05-2018 12:30 PM
Hi, I would suggest to use the Record reader / writer processors. You can read using a CSVRecordReader (can specify tab as the delimiter) and then use ConverRecord to convert to another schema. you have to define a schema for the records though in avro format.
Created 04-05-2018 12:30 PM
Hi, I would suggest to use the Record reader / writer processors. You can read using a CSVRecordReader (can specify tab as the delimiter) and then use ConverRecord to convert to another schema. you have to define a schema for the records though in avro format.