I'm trying to merge several parquet files with an Apache Nifi flow, readingg them from HDFS and saving it again to HDFS.
I have the following components:
ListHDFS => Fetch Parquet => Merge Content => Put parquet.
I can't get it working because it seems I have to specify the schema to be used on the Put Parquet component, and I have to register it on Schema registry or save it on a property in the records.
Now I'm trying to automatically extract the schema from the parquet data, since I want it to work with any generic parquet file without prior knowledge of the schema.
I'm trying to use a "InferAvroSchema" component but I would net to convert the parquet data to Avro and I don't find any component to make that without previous knowledge of the schema.
Which is the best way to merge parquet data or how can I extract and register schema from parquet files automatically.