Support Questions

Find answers, ask questions, and share your expertise

Storing schemas in NiFi

avatar
Explorer

I have 20 CSV files and I need to convert these files into parquet format before loading them to S3. I prepared schemas for 20 files(different schema for each file). For this to work I need to route all these files using route on attribute and process. This works but it create flow with 20 convert record processors. I want to know is there a way that I can store all these 20 schemas and I can use them according to filename with single convert record processor.

 

Thanks.

2 ACCEPTED SOLUTIONS

avatar
Super Guru

@naga_satish   yes, what you are looking for is the Schema Registry:

 

 

https://docs.cloudera.com/HDPDocuments/HDF3/HDF-3.0.0/bk_schema-registry-user-guide/content/ch_integ...

 

The schema registry can be configured in NiFI, then the schema you create there are available in NiFi Record Readers and Writers.

 

If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.

 

Thanks,

Steven

View solution in original post

avatar
Super Guru

The problem is that you need something to store the dynamic schemas in.  That is where the Schema Registry comes in as it provides a UI and api to add/update/delete schemas.  These can then be refrenced from NiFi. 

 

   It looks like AvroSchemaRegistry allows you to do the similar,  minus the ui/api.   So you would need to create your schema in your flow, as attribute, and send that to AvroRecorderReader configured against AvroSchemaRegistry.  You could use some other data store to store these schemas, but you would need to pull them out into an attribute of the same name configured in the Reader and Registry.

 

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-registry-nar/1.12.1/org.apach...

 

The latter method does not give you a way to manage all the schemas, which is why I reference the Hortonworks Schema Registry which does include ability to manage, version actual schemas.

 

 

 

View solution in original post

3 REPLIES 3

avatar
Super Guru

@naga_satish   yes, what you are looking for is the Schema Registry:

 

 

https://docs.cloudera.com/HDPDocuments/HDF3/HDF-3.0.0/bk_schema-registry-user-guide/content/ch_integ...

 

The schema registry can be configured in NiFI, then the schema you create there are available in NiFi Record Readers and Writers.

 

If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.

 

Thanks,

Steven

avatar
Explorer

@stevenmatisonIn the link that you provided, they explained how to setup HWX schema registry. But I don't have access for HWX schema registry. So I wanted to go ahead with AvroSchemaRegistry controller service. Could you please show how to add schemas to AvroSchemaRegistry and use them dynamically according to filename.

 

Thanks.

avatar
Super Guru

The problem is that you need something to store the dynamic schemas in.  That is where the Schema Registry comes in as it provides a UI and api to add/update/delete schemas.  These can then be refrenced from NiFi. 

 

   It looks like AvroSchemaRegistry allows you to do the similar,  minus the ui/api.   So you would need to create your schema in your flow, as attribute, and send that to AvroRecorderReader configured against AvroSchemaRegistry.  You could use some other data store to store these schemas, but you would need to pull them out into an attribute of the same name configured in the Reader and Registry.

 

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-registry-nar/1.12.1/org.apach...

 

The latter method does not give you a way to manage all the schemas, which is why I reference the Hortonworks Schema Registry which does include ability to manage, version actual schemas.