As a part of my requirement, i would like to ingest a CSV file to HDFS. The schema of the CSV file keeps changing( mainly new columns added now and then). Is it possible to accomplish this using NIFI?
If you are converting the csv data to ORC format then ConvertAvroToOrc processor adds hive.ddl attribute to the flowfile and make use of that attribute you can recreate the external table everytime while ingesting the data.
ExtractAvroMetadata processor extracts the avro schema from the avro datafile and adds to the flowfile attributes so using the avro schema you can drop and recreate the table everytime.
While configuring Record SetWriter controller service keep the below property value as
Schema Write Strategy
Set 'avro.schema' Attribute
Then each flowfile will have avro.schema attribute, make use of this attribute prepare your create table statement that includes all the columns.
Create external tables so that when you drop the table there will be no data loss in this case and assuming all the new fields are appended to the existing data model, if there are some fields got removed or changed the position in the file then based on new schema you are going to creating the table which will results data issues .!!
Thank You. Is there a way through NIFI that existing table be updated as the schema changes?
Thanks In Adavance