Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Nifi process big file using ConvertRecord processor

avatar
Contributor

Hi All,

I have a big file JSON format ( 1m records ). I need to replace a couple of fields in each JSON using custom logic. I used ExecuteScript processor using a Groovy script but got out of memory exception. I want to try using ConvertRecord processor.

I don't have a schema of the JSON that is why I want to use I want to use ScriptedReader and ScriptedRecordSetWriter.

Questions:

What is the best practice for a use case like process big JSON file and make changes of per record? is it a good idea to use ConvertRecord or should be a different approach.

Can you point me to an example of ScriptedReader and ScriptedRecordSetWriter using Groovy. ( I found a lot of ExecuteScript examples but Record Based Groovy I don't ).

Also is it possible to share the docs/blog link with an internals of Record based approach? I want to understand at deeper technical level why it is much robust comparing to ExecuteScript ( file-based approach ).

Thanks

Oleg.

1 ACCEPTED SOLUTION

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
4 REPLIES 4

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Contributor

Thanks Bryan for the explanation.

It is much clear now. Just to get started, if I need to read Avro content coming from QueryDataBase processor. I need to make changes in each line and convert it after to CSV.

RecordReader will read the output from QueryDataBase processor and make changes. Is there a simple example how to read Avro (groovy script ) using ScriptedReader.

The output should be CSV. Is it possible to convert Avro from CSV using ConvertRecord or it is also writing code?

If it is writing a code can you give me some example how to convert AVRO to CSV using groovy (in the context of ScriptedRecordSetWriter)

Thanks

Oleg

avatar
Master Guru

What types of changes do you need to make to the data?

You can use ConvertRecord with an AvroReader and a CsvWriter in order to convert from Avro to CSV.

You can use UpdateRecord before or after that to update the records.

avatar
Contributor

HI Bryan.

end2end flow is:

read from db ( nifi returns Avro format ) -> md5 selected columns -> convert to csv -> put to s3

I wanted to do md5 using groovy script using ConvertRecord .

But I don't know how to start with a groovy script. Unitests didn't help me. If it is possible please share the example how groovy reads Avro data using ScriptedRecordReader)

If my approach is wrong please suggest the better way.

Thanks

Oleg.