Created 10-12-2017 07:54 PM
ccr-and-hdr-20.txtHello.
I am creating a WorkFlow to convert CSV to JSON, and I need help configuring ConvertRecords's JsonRecordSetWriter controller service. What is happening is that a SchemaNotFoundException is being thrown saying "Unable to find schema with name 'ccr' (The name I chose for the data). The schema is inferred using "InferAvroSchema", and "UpdateAttribute" creates an atttributed named "schema.name" that is set to 'ccr'. The controller service "AvroSchemaRegistry" has a property added to it named "ccr" and the value for ths property is "${inferred.avro.schema}". InferAvroSchema's SchemaOutputDestination property is set to value "flowfile-attribute" meaning the inferred avro schema will be put into an attribute named "inferred.avro.schema". I have attached the workflow and the csv data set. The processor's directory paths will need to be changed so that this workflow can be tested. Again I really need help in that I do not whan to have to specify the schema as text. Rather I would like the schema inferred so that I can have CSV files of differing header and data content processed by the same workflow. Any help and guidance you can share with me I would greatly appreciate it. Note that the CSV file's extension has been changed to ".txt" and as such will need to be changed back to ".csv".
csv-to-json-to-es5-with-id-csvreader.xml
Respectively,
Patrick
Created on 10-12-2017 08:22 PM - edited 08-17-2019 09:04 PM
From the documentation of the AvroSchemaRegistry it looks like the actual schema should be given to the registry:
'value' represents the textual representation of the actual schema following the syntax and semantics of Avro's Schema format.
${inferred.avro.schema} is an attribute of the flow file and doesn't make sens for the registry.
To implement your use case, you should use "use schema text property" as a schema access strategy. It's more suitable for your use case with dynamic schemas. This way, the schema can be read from the flow file and used for the conversion.
A schema registry is more for governance so you will be adding and managing schemas manually.
Configure your CSVReader like below
And your JSONRecordSetWritter like below
I tried it on your flow/data and it's working.
Dos this helps?
Created on 10-12-2017 08:22 PM - edited 08-17-2019 09:04 PM
From the documentation of the AvroSchemaRegistry it looks like the actual schema should be given to the registry:
'value' represents the textual representation of the actual schema following the syntax and semantics of Avro's Schema format.
${inferred.avro.schema} is an attribute of the flow file and doesn't make sens for the registry.
To implement your use case, you should use "use schema text property" as a schema access strategy. It's more suitable for your use case with dynamic schemas. This way, the schema can be read from the flow file and used for the conversion.
A schema registry is more for governance so you will be adding and managing schemas manually.
Configure your CSVReader like below
And your JSONRecordSetWritter like below
I tried it on your flow/data and it's working.
Dos this helps?
Created 10-20-2017 12:52 PM
This worked! Thank you very much Abdelkrim Hadjidj. Your explanation makes sense, and I understand now what I was doing incorrectly.
Created 10-20-2017 01:39 PM
Glad that the answer was useful. Please accept the answer to close this thread. Thanks
Created 08-29-2018 06:42 AM
I tried the same flow, i am putting the data into HBase from HTTP.
I have one CSV file that contains field (ID,Movie,Type), In GetFile processor i am taking this file and flow remains same as yours. In UpdateAttribute i am giving schema.name is "MoviesRecord".
But getting error in ConvertRecord processor that ConvertRecord is failed to process StandardFlowFileRecord "will route to failure Field field_0 can not be null. "
Any help that would be great.
Thanks