Support Questions
Find answers, ask questions, and share your expertise

Apache Nifi : Creating parquet file from CSV with schema saved in avro.schema attribute

New Contributor

Hi Experts,

Need help!

 

AIM: Want to create a parquet file from a CSV file using Nifi. 

Problem: I am able to convert the CSV to parquet file, but the schema of parquet file contains struct type(Which I need to overcome) and convert it into string type.

I am using Apache Nifi 1.14.0 on Windows Server 2016.

 

This is what I tried...

I have used 3 controllers

CSVReader

CSVRecordSetWriter

ParquetRecordSetWriter

 

These are the processors

GetFile

ConvertRecord(CSVReader to CSVRecordSetWriter)

UpdateAttribute(Updating avro.schema, where ever I have 2 data types inferred, I am replacing it to '["null","string"]')

ConvertRecord(CSVReader to ParquetRecordSetWriter)

UpdatedAttribute(For appending '.parquet' in the filename)

PutFile

 

I also want to know, how to view a .parquet file in Windows OS. Currently, I am reading the parquet file via PySpark and checking the schema. 😐  

0 REPLIES 0
; ;