Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

In NiFi is there Any processor for converting Avro to CSV or .txt format.

avatar
Expert Contributor

Looking to extract tables from sql server using nifi and export it to s3 in either csv or .txt format. Iam using ExecuteSQL processor and it outputs avro format but we are looking for csv or txt (dont know if json is any good). ANy out of the box processors you suggest or do we have to create a processor? plz let me know thank u. For Avrotojson there is a processor but looking for AvrotoCSV.

5 REPLIES 5

avatar

How big is the table? If it is very big and not updating then you can use sqoop for this task otherwise another option could be to use use below template for that uses ExecuteScript with Groovy to issue a SQL query and produce a flowfile containing a CSV representation of the results.

Follow template here https://gist.github.com/mattyb149/9945663ae9ae5dcb1ddb43d21553204d

avatar

Hi @SparkRocks

At the moment there is no such processor but there are ongoing discussions on the subject [1].

The best output format really depends of what you want to do with the data, JSON looks like a good format to me for its flexibility.

If you really want CSV, there are multiple options depending of how complicated is your input data. If there are not a lot of tables/columns, you could do something like:

ExecuteSQL > AvroToJson > EvaluateJsonPath (to extract the values into flow file attributes) > ReplaceText to write your CSV using expression language > PutS3

Otherwise you could have a look to ExecuteScript processor and rely on an existing library to perform the conversion.

Or, as you suggested, you could write a custom processor and you could contribute your code to NiFi.

[1] https://issues.apache.org/jira/browse/NIFI-1372

Hope this helps.

avatar
New Contributor

There is actually a pretty easy way to do this. When you perform the ExecuteSQL, you will have data in Avro format, and you will have an attribute named executesql.row.count. All you need to do to get it to csv is use a single SplitRecord processor.

Set RecordReader to a simple AvroReader Controller Service. Leave it at default so Schema Access Strategy = Use Embedded Avro Schema.

For the RecordWriter now you need to setup your CSVRecordSetWriter Controller Service. Here all you need to do is create the Avro Schema based on the column names from your sql query. See below for an example.

For Records Per Split just set this to ${executesql.row.count}.

Now the flowfile that comes out of your SplitRecord will be a single file, with a header row, and all other rows from your db query as a nice csv file you can send of to SFTP or S3.

Example Avro Schema (nice thing here is you don't really care about type, since we are just putting into a csv, just use string so everything validates ok).

{
  "type" : "record",
  "name" : "Sample",
  "doc" : "Schema generated by Your's Truly",
  "fields" : [{
    "name" : "field_a_from_select_statement",
    "type" : "string",
    "default" : null
  },{
    "name" : "field_b_from_select_statement",
    "type" : "string",
    "default" : null
  },{
    "name" : "field_c_from_select_statement",
    "type" : "string",
    "default" : null
  }]
}

avatar
Master Guru

Add one processor:

ConvertRecord:

Use AvroReader

Use CSVRecordSetWriter

avatar
New Contributor

Ha thanks, I was using my SplitRecord as a ConvertRecord by using all rows in the sql response. This is better solution than mine above.