Support Questions

smunigati · ‎07-18-2016

Goal is to read the data from RDBMS table and store it in HDFS in avro format with additional column , let say source table has 5 columns , as part of the ingestion I would like to add additional column say "ingest_datetime" with current_time value before nifi stores the file in HDFS finally HDFS should have avro file with 6 columns in the end .

Currently I am using ExecuteSQL --> PutHDFS processors

bbende · ‎07-18-2016

Currently there aren't any processors that perform direct manipulation of Avro, although we definitely would like to have some.

Possible options to work around this...

Use ConvertAvroToJson followed by the new JOLT transform processor followed by ConvertJsonToAvro (involves a lot of conversion and may lose some of the initial schema)
Use ExecuteScript processor to manipulate the Avro (I am not sure if any of NiFi's supported scripting languages have good Avro support)
Write a custom Java processor to manipulate the Avro (https://cwiki.apache.org/confluence/display/NIFI/Maven+Projects+for+Extensions)

Happy to help answer any questions if going the custom Java processor route.

View solution in original post

bbende · ‎07-18-2016

Currently there aren't any processors that perform direct manipulation of Avro, although we definitely would like to have some.

Possible options to work around this...

Use ConvertAvroToJson followed by the new JOLT transform processor followed by ConvertJsonToAvro (involves a lot of conversion and may lose some of the initial schema)
Use ExecuteScript processor to manipulate the Avro (I am not sure if any of NiFi's supported scripting languages have good Avro support)
Write a custom Java processor to manipulate the Avro (https://cwiki.apache.org/confluence/display/NIFI/Maven+Projects+for+Extensions)

Happy to help answer any questions if going the custom Java processor route.

smunigati · ‎07-19-2016

Thanks for the response ..

Do you know when the new JOLT transform processor is going to be releasing ? existing 0.61. or 0.7 does not have this new processor you are talking about, but NIFI-361 ticket is talking about it.

bbende · ‎07-19-2016

It is in the 0.7.0 release, part of the stanadard bundle:

https://github.com/apache/nifi/blob/0.x/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processo...

shishir_saxena4 · ‎07-19-2016

@Sreekanth Munigati , As @Bryan Bende mentioned, there is no direct way of manipulating Avro data, but in your case you can try modifying SQL being executed by ExecuteSQL processor to add an additional column in SQL itself.

Cloudera Community

Support Questions

How do I add additional columns to Flow File content