AIM: Want to create a parquet file from a CSV file using Nifi.
Problem: I am able to convert the CSV to parquet file, but the schema of parquet file contains struct type(Which I need to overcome) and convert it into string type.
I am using Apache Nifi 1.14.0 on Windows Server 2016.
This is what I tried...
I have used 3 controllers
These are the processors
ConvertRecord(CSVReader to CSVRecordSetWriter)
UpdateAttribute(Updating avro.schema, where ever I have 2 data types inferred, I am replacing it to '["null","string"]')
ConvertRecord(CSVReader to ParquetRecordSetWriter)
UpdatedAttribute(For appending '.parquet' in the filename)
I also want to know, how to view a .parquet file in Windows OS. Currently, I am reading the parquet file via PySpark and checking the schema. 😐