Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Am I Configuring NiFi's AvroSchemaRegistry Correctly

avatar
New Contributor

ccr-and-hdr-20.txtHello.

I am creating a WorkFlow to convert CSV to JSON, and I need help configuring ConvertRecords's JsonRecordSetWriter controller service. What is happening is that a SchemaNotFoundException is being thrown saying "Unable to find schema with name 'ccr' (The name I chose for the data). The schema is inferred using "InferAvroSchema", and "UpdateAttribute" creates an atttributed named "schema.name" that is set to 'ccr'. The controller service "AvroSchemaRegistry" has a property added to it named "ccr" and the value for ths property is "${inferred.avro.schema}". InferAvroSchema's SchemaOutputDestination property is set to value "flowfile-attribute" meaning the inferred avro schema will be put into an attribute named "inferred.avro.schema". I have attached the workflow and the csv data set. The processor's directory paths will need to be changed so that this workflow can be tested. Again I really need help in that I do not whan to have to specify the schema as text. Rather I would like the schema inferred so that I can have CSV files of differing header and data content processed by the same workflow. Any help and guidance you can share with me I would greatly appreciate it. Note that the CSV file's extension has been changed to ".txt" and as such will need to be changed back to ".csv".

csv-to-json-to-es5-with-id-csvreader.xml

Respectively,

Patrick

1 ACCEPTED SOLUTION

avatar

@Patrick Maggiulli

From the documentation of the AvroSchemaRegistry it looks like the actual schema should be given to the registry:

'value' represents the textual representation of the actual schema following the syntax and semantics of Avro's Schema format.

${inferred.avro.schema} is an attribute of the flow file and doesn't make sens for the registry.

To implement your use case, you should use "use schema text property" as a schema access strategy. It's more suitable for your use case with dynamic schemas. This way, the schema can be read from the flow file and used for the conversion.

A schema registry is more for governance so you will be adding and managing schemas manually.

Configure your CSVReader like below

40821-screen-shot-2017-10-12-at-103434-pm.png

And your JSONRecordSetWritter like below

40822-screen-shot-2017-10-12-at-103533-pm.png

I tried it on your flow/data and it's working.

Dos this helps?

View solution in original post

4 REPLIES 4

avatar

@Patrick Maggiulli

From the documentation of the AvroSchemaRegistry it looks like the actual schema should be given to the registry:

'value' represents the textual representation of the actual schema following the syntax and semantics of Avro's Schema format.

${inferred.avro.schema} is an attribute of the flow file and doesn't make sens for the registry.

To implement your use case, you should use "use schema text property" as a schema access strategy. It's more suitable for your use case with dynamic schemas. This way, the schema can be read from the flow file and used for the conversion.

A schema registry is more for governance so you will be adding and managing schemas manually.

Configure your CSVReader like below

40821-screen-shot-2017-10-12-at-103434-pm.png

And your JSONRecordSetWritter like below

40822-screen-shot-2017-10-12-at-103533-pm.png

I tried it on your flow/data and it's working.

Dos this helps?

avatar
New Contributor

This worked! Thank you very much Abdelkrim Hadjidj. Your explanation makes sense, and I understand now what I was doing incorrectly.

avatar

Hi @Patrick Maggiulli

Glad that the answer was useful. Please accept the answer to close this thread. Thanks

avatar
Rising Star

Hi @Patrick Maggiulli

I tried the same flow, i am putting the data into HBase from HTTP.

I have one CSV file that contains field (ID,Movie,Type), In GetFile processor i am taking this file and flow remains same as yours. In UpdateAttribute i am giving schema.name is "MoviesRecord".

But getting error in ConvertRecord processor that ConvertRecord is failed to process StandardFlowFileRecord "will route to failure Field field_0 can not be null. "

Any help that would be great.

Thanks