Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Sandbox HDF Nifi: How do I convert TSV files to CSV?

avatar
New Member

Currently I have a dataflow with the GetFile processor that taps into a directory path with TSV files. I want to convert these TSV files to CSV for later work using the ConvertCSVToAvro processor. I've created this python script with a .bash wrapper to test it:

import sys
import csv
 
tsvin = csv.reader(sys.stdin, dialect=csv.excel_tab)
commaout = csv.writer(sys.stdout, dialect=csv.excel)
for row in tsvin:
   commaout.writerow(row)

bash wrapper

for file in *.tsv
do
    python tsv2csv.py < $file > ${file%.*}.csv
done

I see the ExecuteScript processor as a possible option. How would I use it to execute this python script--would the processor know where to import from for example...or is there a better way to convert?

1 ACCEPTED SOLUTION

avatar
Contributor

Hi, I would suggest to use the Record reader / writer processors. You can read using a CSVRecordReader (can specify tab as the delimiter) and then use ConverRecord to convert to another schema. you have to define a schema for the records though in avro format.

View solution in original post

1 REPLY 1

avatar
Contributor

Hi, I would suggest to use the Record reader / writer processors. You can read using a CSVRecordReader (can specify tab as the delimiter) and then use ConverRecord to convert to another schema. you have to define a schema for the records though in avro format.