Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Convert Parquet file to CSV using NiFi

avatar
Contributor

I haven't worked on Parquet file but i have requirement to convert Parquet file to CSV using NiFi.

Appreciate if you could guide me on this.

1 ACCEPTED SOLUTION

avatar
Super Guru

@Gubbi 

 

The solution you are looking for is ConvertRecord + ParquetRecordReader + CSVRecordWriter.  The ParquetRecordReader is included in NiFi 1.10 and up.  If you have an older nifi, here is a post where I talk about adding the required jar files to nifi 1.9 (older version doesn't have parquet):

 

https://community.cloudera.com/t5/Support-Questions/Can-I-put-the-NiFi-1-10-Parquet-Record-Reader-in...

 

Another suggestion: If you are working with nifi and hadoop/hdfs/hive,  you could store the raw parquet, create external hive table on parquet, then select results and insert them into similar table of csv format.  Then you select the csv table results and create csv file. 

 

 

Also in order to validate/inspect your parquet, or to read the schema (if you need it for controller services) you use parquet tools:  

https://community.cloudera.com/t5/Community-Articles/Build-and-use-Parquet-tools-to-read-parquet-fil...

 

 

If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.  

 

Thanks,


Steven @ DFHZ

View solution in original post

3 REPLIES 3

avatar
Super Guru

@Gubbi 

 

The solution you are looking for is ConvertRecord + ParquetRecordReader + CSVRecordWriter.  The ParquetRecordReader is included in NiFi 1.10 and up.  If you have an older nifi, here is a post where I talk about adding the required jar files to nifi 1.9 (older version doesn't have parquet):

 

https://community.cloudera.com/t5/Support-Questions/Can-I-put-the-NiFi-1-10-Parquet-Record-Reader-in...

 

Another suggestion: If you are working with nifi and hadoop/hdfs/hive,  you could store the raw parquet, create external hive table on parquet, then select results and insert them into similar table of csv format.  Then you select the csv table results and create csv file. 

 

 

Also in order to validate/inspect your parquet, or to read the schema (if you need it for controller services) you use parquet tools:  

https://community.cloudera.com/t5/Community-Articles/Build-and-use-Parquet-tools-to-read-parquet-fil...

 

 

If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.  

 

Thanks,


Steven @ DFHZ

avatar
Contributor

@stevenmatison : I tried below flow and getting this error.

ConvertRecord[id=7d5d0bb5-0174-1000-1a76-dc9d3a7b1b35] Failed to process StandardFlowFileRecord[uuid=4d10880f-3ce2-475a-ac53-cca01f190600,claim=,offset=0,name=userdata1.parquet,size=0]; will route to failure: org.apache.nifi.parquet.stream.NifiParquetInputFile@17badbbc is not a Parquet file (too small length: 0)
 

Below is my ConvertRecord settings.

Gubbi_0-1599832386874.png

 

For reading file from location and i want to convert into CSV and transfer to other location,  i have built flow ListFile-->ConvertRecord-->FetchFile-->PutFile. Can you please check if the above settings is correct and flow is correct for my requirement

 

avatar
Super Guru

@Gubbi   use this:

 

ListFile -> FetchFile -> ConvertRecord