Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Am I Configuring NiFi's AvroSchemaRegistry Correctly

avatar
New Member

ccr-and-hdr-20.txtHello.

I am creating a WorkFlow to convert CSV to JSON, and I need help configuring ConvertRecords's JsonRecordSetWriter controller service. What is happening is that a SchemaNotFoundException is being thrown saying "Unable to find schema with name 'ccr' (The name I chose for the data). The schema is inferred using "InferAvroSchema", and "UpdateAttribute" creates an atttributed named "schema.name" that is set to 'ccr'. The controller service "AvroSchemaRegistry" has a property added to it named "ccr" and the value for ths property is "${inferred.avro.schema}". InferAvroSchema's SchemaOutputDestination property is set to value "flowfile-attribute" meaning the inferred avro schema will be put into an attribute named "inferred.avro.schema". I have attached the workflow and the csv data set. The processor's directory paths will need to be changed so that this workflow can be tested. Again I really need help in that I do not whan to have to specify the schema as text. Rather I would like the schema inferred so that I can have CSV files of differing header and data content processed by the same workflow. Any help and guidance you can share with me I would greatly appreciate it. Note that the CSV file's extension has been changed to ".txt" and as such will need to be changed back to ".csv".

csv-to-json-to-es5-with-id-csvreader.xml

Respectively,

Patrick

1 ACCEPTED SOLUTION

avatar

@Patrick Maggiulli

From the documentation of the AvroSchemaRegistry it looks like the actual schema should be given to the registry:

'value' represents the textual representation of the actual schema following the syntax and semantics of Avro's Schema format.

${inferred.avro.schema} is an attribute of the flow file and doesn't make sens for the registry.

To implement your use case, you should use "use schema text property" as a schema access strategy. It's more suitable for your use case with dynamic schemas. This way, the schema can be read from the flow file and used for the conversion.

A schema registry is more for governance so you will be adding and managing schemas manually.

Configure your CSVReader like below

40821-screen-shot-2017-10-12-at-103434-pm.png

And your JSONRecordSetWritter like below

40822-screen-shot-2017-10-12-at-103533-pm.png

I tried it on your flow/data and it's working.

Dos this helps?

View solution in original post

4 REPLIES 4

avatar

@Patrick Maggiulli

From the documentation of the AvroSchemaRegistry it looks like the actual schema should be given to the registry:

'value' represents the textual representation of the actual schema following the syntax and semantics of Avro's Schema format.

${inferred.avro.schema} is an attribute of the flow file and doesn't make sens for the registry.

To implement your use case, you should use "use schema text property" as a schema access strategy. It's more suitable for your use case with dynamic schemas. This way, the schema can be read from the flow file and used for the conversion.

A schema registry is more for governance so you will be adding and managing schemas manually.

Configure your CSVReader like below

40821-screen-shot-2017-10-12-at-103434-pm.png

And your JSONRecordSetWritter like below

40822-screen-shot-2017-10-12-at-103533-pm.png

I tried it on your flow/data and it's working.

Dos this helps?

avatar
New Member

This worked! Thank you very much Abdelkrim Hadjidj. Your explanation makes sense, and I understand now what I was doing incorrectly.

avatar

Hi @Patrick Maggiulli

Glad that the answer was useful. Please accept the answer to close this thread. Thanks

avatar
Rising Star

Hi @Patrick Maggiulli

I tried the same flow, i am putting the data into HBase from HTTP.

I have one CSV file that contains field (ID,Movie,Type), In GetFile processor i am taking this file and flow remains same as yours. In UpdateAttribute i am giving schema.name is "MoviesRecord".

But getting error in ConvertRecord processor that ConvertRecord is failed to process StandardFlowFileRecord "will route to failure Field field_0 can not be null. "

Any help that would be great.

Thanks