Support Questions

Find answers, ask questions, and share your expertise

Apache nifi - how to convert a file .txt into Parquet (to save into HDFS) with Nifi?

avatar
Contributor

Hi, i can't compress so many files.txt into a Parquet format to save in HDFS.

How can i do that? @ApacheNifi 

1 ACCEPTED SOLUTION

avatar
Super Guru

You must have the reader incorrectly configured for your CSV schema. 

View solution in original post

3 REPLIES 3

avatar
Super Guru

@Lallagreta  The solution you are looking for is to leverage NiFi Parquet Processors w/ Parquet Record Reader/Writer

 

Some fun links:

 

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-parquet-nar/1.11.4/org.apache...

 

https://community.cloudera.com/t5/Community-Articles/Apache-NiFi-1-10-Support-for-Parquet-RecordRead...

 

The Parquet procs are part of Nifi1.10 and up, but you can also install the nars into any older nifi versions:

https://community.cloudera.com/t5/Support-Questions/Can-I-put-the-NiFi-1-10-Parquet-Record-Reader-in...

 

If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.

 

Thanks,

Steven

avatar
Contributor

Hi, thank you so much for your answer.

I understand that i had to treat the data as CSV format with a “tab” delimiter rather than a “,”. 

For my project i use this flow:

GetFile -> UpdateAttributo -> PutParquet but something go wrong.

The error that compare is: "Unable to create record reader".

This is my processor configuration: Schermata 2021-01-09 alle 10.58.06.pngSchermata 2021-01-09 alle 10.58.13.pngSchermata 2021-01-09 alle 10.58.35.pngSchermata 2021-01-09 alle 10.58.46.pngSchermata 2021-01-09 alle 10.59.16.png

THANK YOU @ApacheNifi 

avatar
Super Guru

You must have the reader incorrectly configured for your CSV schema.