Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to generate Avro Schema dynamically (during runtime) to validate a csv file?

avatar

I have a scenario where the columns in a CSV are consistently changing, I want to validate this CSV irrespective of the changes 

Currently I am validating using a static schema, by placing a static schema in Schema Text property of Validate Record processor. But everytime there is a change in file format(a new column is added or removed) I have to update the schema in Schema text property of processor.

Is there a way I can make this dynamic ?

Currently I am using NIFI 1.11.4(I couldnt find InferSchema processor or JSON to Avro converter)

1 ACCEPTED SOLUTION

avatar
Super Guru

I think the most straight forward would be to drop the infer schema into your version of NiFi.   The procedure is not that hard, you just have to be surgically careful.  The process is explained a bit here in reference to adding parquet jars from new version, into older version.   Be sure to read all the comments:

 

https://community.cloudera.com/t5/Support-Questions/Can-I-put-the-NiFi-1-10-Parquet-Record-Reader-in...

 

View solution in original post

3 REPLIES 3

avatar
Super Guru

@SashankRamaraju In most recent version of NiFi some of the older methods (infer schema) have been left behind.  You can certainly add them back in manually (PM if you want specifics).  However,  the current tools to manage record conversion are definitely preferred and bundle into NiFi out of the box on purpose.    

 

To solve your constantly changing csv, I would push back as to why the csv contents are changing.  If there was nothing I could do about it upstream, I would create a flow that split the different csvs up based on known schemas.  I would process the ones I have schema for and create a holder process group for those that fail.  I would monitor failures, create flow branches for new schemas (teaching my flow to be smarter over time).     After this kind of evaluation,  I would have a clear idea of how much the csv is actually changing.  I could now do some upstream actions on each csv to converge them into a single schema before I start processing them in Nifi.  For example, if some fields are missing,  I could do the work to add them (as empty values) before reading them with a single schema reader.  This gets a bit  kludgy but I wanted to explain the thought process and evaluation of how to converge the schemas into a single reader.   I would likely not do the later, and just split the flow for each csv difference.

 

 

If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.  

 

Thanks,


Steven @ DFHZ

avatar

@stevenmatison Just a small correction to the description, The columns in my CSV will be adding up every year. No columns will be removed from my CSV
If in this year I have 100 columns next year it will be 110, so every year 10 new columns  will get added .

Basically I have 10 columns with the suffix of year, repeated for different years
Once I validate them using a schema I will be applying transformation on these columns to make 110 columns into 20 columns according to my database.

Also I got what you are saying but I want to find a dynamic way to validate the CSV .
If you can help me with some mock flow which is similar to my challenge it will be of great help(Also preferably I want to achieve without using custom processors or script)





avatar
Super Guru

I think the most straight forward would be to drop the infer schema into your version of NiFi.   The procedure is not that hard, you just have to be surgically careful.  The process is explained a bit here in reference to adding parquet jars from new version, into older version.   Be sure to read all the comments:

 

https://community.cloudera.com/t5/Support-Questions/Can-I-put-the-NiFi-1-10-Parquet-Record-Reader-in...