Support Questions
Find answers, ask questions, and share your expertise

Merge parquet files with nifi

Merge parquet files with nifi


I'm trying to merge several parquet files with an Apache Nifi flow, readingg them from HDFS and saving it again to HDFS.

I have the following components:

ListHDFS => Fetch Parquet => Merge Content => Put parquet.

I can't get it working because it seems I have to specify the schema to be used on the Put Parquet component, and I have to register it on Schema registry or save it on a property in the records.

Now I'm trying to automatically extract the schema from the parquet data, since I want it to work with any generic parquet file without prior knowledge of the schema.

I'm trying to use a "InferAvroSchema" component but I would net to convert the parquet data to Avro and I don't find any component to make that without previous knowledge of the schema.

Which is the best way to merge parquet data or how can I extract and register schema from parquet files automatically.

Thank you