Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Sandbox HDF Nifi: How do I convert TSV files to CSV?

avatar
Contributor

Currently I have a dataflow with the GetFile processor that taps into a directory path with TSV files. I want to convert these TSV files to CSV for later work using the ConvertCSVToAvro processor. I've created this python script with a .bash wrapper to test it:

import sys
import csv
 
tsvin = csv.reader(sys.stdin, dialect=csv.excel_tab)
commaout = csv.writer(sys.stdout, dialect=csv.excel)
for row in tsvin:
   commaout.writerow(row)

bash wrapper

for file in *.tsv
do
    python tsv2csv.py < $file > ${file%.*}.csv
done

I see the ExecuteScript processor as a possible option. How would I use it to execute this python script--would the processor know where to import from for example...or is there a better way to convert?

1 ACCEPTED SOLUTION

avatar
Contributor

Hi, I would suggest to use the Record reader / writer processors. You can read using a CSVRecordReader (can specify tab as the delimiter) and then use ConverRecord to convert to another schema. you have to define a schema for the records though in avro format.

View solution in original post

1 REPLY 1

avatar
Contributor

Hi, I would suggest to use the Record reader / writer processors. You can read using a CSVRecordReader (can specify tab as the delimiter) and then use ConverRecord to convert to another schema. you have to define a schema for the records though in avro format.