Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

possible bug missing parquetreader version 2.0.0-M1

avatar
Contributor

Good Morning,

I'm flummoxed and worried. I have a significant flow that handles parquet files, and I can no longer make it work in version 2.0. Specifically, I used a ConvertRecord processor with a ParquetReader and a JsonRecordSetWriter.

In the new version, 2.0, There doesn't appear to be a ParquetReader, anymore. There also doesn't appear to be any documentation about the ParquetReader being deprecated.

Please help. This flow is critical to me. I am also open to suggestions on how to accomplish the same conversion in a different manner.

2 ACCEPTED SOLUTIONS

avatar
Master Mentor

@arutkwccu 

The parquet components were not "deprecated", they were moved out of the default distribution via https://issues.apache.org/jira/browse/NIFI-12282

Apache limits the max size of the project and at times some nars need to be moved out of the distribution to avoid exceeding that max allowed project download size or components less commonly used may be moved out.  Does not mean that these components ate no longer being contributed to or updated with the newer NiFi release versions.  It does mean however that you will need to manually download the nifi-parquet-nar and any dependency nar(s) it needs that are not already included in your NiFi distribution.

You can download nars directly from maven central.
Here is the link to the maven central for the nifi-parquet-nar:
https://central.sonatype.com/artifact/org.apache.nifi/nifi-parquet-nar/overview

If you look at the "Dependencies" tab, you'll see that there is a dependecy on another nar (nifi-hadoop-libraries-nar).

Neither of these nars are inlcuded in the default Apache NiFi 2.0.-M1 release.
The simplest way to add these nars to your NiFi is to download them into the <path-toNiFi>/extensions/" folder. NiFi will auto-load nars placed on this folder without any need for a NiFi restart.

You can find the nars by clicking on the "versions" tab for each required nar and clciking on "Browse" next to the NiFi version you want the nar for.

Here are the direct links for the two nars you need for parquet:

nifi-parquet-nar - https://repo1.maven.org/maven2/org/apache/nifi/nifi-parquet-nar/2.0.0-M1/nifi-parquet-nar-2.0.0-M1.n...

nifi-hadoop-libraries-nar - https://repo1.maven.org/maven2/org/apache/nifi/nifi-hadoop-libraries-nar/2.0.0-M1/nifi-hadoop-librar...

If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt

 

View solution in original post

avatar
Master Mentor

@arutkwccu 
Yes, the added ParquetReader will be available to the record processors that utilize record reader or writer.
There is not a lot involved in this process.
Simply download these two nar files and place them in the NiFi extensions folder owned by NiFi service user. NiFi takes care of the rest.
I added these two nars to my 2.0.0-M1 installation to show you that it works here:

MattWho_0-1704288440866.png

MattWho_1-1704288655890.png

If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt



View solution in original post

7 REPLIES 7

avatar
Super Collaborator

I don't see any parquet NAR files in my NiFi 2.0.0-M1 install or in the Docker image.

 

avatar
Master Mentor

@arutkwccu 

The parquet components were not "deprecated", they were moved out of the default distribution via https://issues.apache.org/jira/browse/NIFI-12282

Apache limits the max size of the project and at times some nars need to be moved out of the distribution to avoid exceeding that max allowed project download size or components less commonly used may be moved out.  Does not mean that these components ate no longer being contributed to or updated with the newer NiFi release versions.  It does mean however that you will need to manually download the nifi-parquet-nar and any dependency nar(s) it needs that are not already included in your NiFi distribution.

You can download nars directly from maven central.
Here is the link to the maven central for the nifi-parquet-nar:
https://central.sonatype.com/artifact/org.apache.nifi/nifi-parquet-nar/overview

If you look at the "Dependencies" tab, you'll see that there is a dependecy on another nar (nifi-hadoop-libraries-nar).

Neither of these nars are inlcuded in the default Apache NiFi 2.0.-M1 release.
The simplest way to add these nars to your NiFi is to download them into the <path-toNiFi>/extensions/" folder. NiFi will auto-load nars placed on this folder without any need for a NiFi restart.

You can find the nars by clicking on the "versions" tab for each required nar and clciking on "Browse" next to the NiFi version you want the nar for.

Here are the direct links for the two nars you need for parquet:

nifi-parquet-nar - https://repo1.maven.org/maven2/org/apache/nifi/nifi-parquet-nar/2.0.0-M1/nifi-parquet-nar-2.0.0-M1.n...

nifi-hadoop-libraries-nar - https://repo1.maven.org/maven2/org/apache/nifi/nifi-hadoop-libraries-nar/2.0.0-M1/nifi-hadoop-librar...

If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt

 

avatar
Contributor

I'm very appreciative of the quick reply. I guess what I don't have clarity on is how this affects the ConvertRecord processor. Are you advising that by installing these nars, that will actually cause a change to the available record readers that are present in the ConvertRecord processor? It's unclear that this is the case by following the link you referenced:
https://central.sonatype.com/artifact/org.apache.nifi/nifi-parquet-nar/overview
I guess I don't see anything in the documentation that references this information, and a review of the hyperlink doesn't actually provide any information about what is made available by installing the nars.
I went to the trouble of downloading the nar from this link and expanding it
https://repo1.maven.org/maven2/org/apache/nifi/nifi-parquet-nar/2.0.0-M1/nifi-parquet-nar-2.0.0-M1.n...
I see the following:

f nifi-parquet-processors-2.0.0-M1.jar
created: META-INF/
inflated: META-INF/MANIFEST.MF
created: META-INF/services/
created: org/
created: org/apache/
created: org/apache/nifi/
created: org/apache/nifi/parquet/
created: org/apache/nifi/parquet/hadoop/
created: org/apache/nifi/parquet/record/
created: org/apache/nifi/parquet/stream/
created: org/apache/nifi/parquet/utils/
created: org/apache/nifi/processors/
created: org/apache/nifi/processors/parquet/
created: META-INF/maven/
created: META-INF/maven/org.apache.nifi/
created: META-INF/maven/org.apache.nifi/nifi-parquet-processors/
inflated: META-INF/DEPENDENCIES
inflated: META-INF/LICENSE
inflated: META-INF/NOTICE
inflated: META-INF/services/org.apache.nifi.controller.ControllerService
inflated: META-INF/services/org.apache.nifi.processor.Processor
inflated: org/apache/nifi/parquet/ParquetReader.class
inflated: org/apache/nifi/parquet/ParquetRecordSetWriter.class
inflated: org/apache/nifi/parquet/hadoop/AvroParquetHDFSRecordReader.class
inflated: org/apache/nifi/parquet/hadoop/AvroParquetHDFSRecordWriter.class
inflated: org/apache/nifi/parquet/record/ParquetRecordReader.class
inflated: org/apache/nifi/parquet/record/WriteParquetResult.class
inflated: org/apache/nifi/parquet/stream/NifiOutputStream.class
inflated: org/apache/nifi/parquet/stream/NifiParquetInputFile.class
inflated: org/apache/nifi/parquet/stream/NifiParquetOutputFile.class
inflated: org/apache/nifi/parquet/stream/NifiSeekableInputStream.class
inflated: org/apache/nifi/parquet/utils/ParquetConfig.class
inflated: org/apache/nifi/parquet/utils/ParquetUtils.class
inflated: org/apache/nifi/processors/parquet/ConvertAvroToParquet.class
inflated: org/apache/nifi/processors/parquet/FetchParquet.class
inflated: org/apache/nifi/processors/parquet/PutParquet.class
inflated: META-INF/maven/org.apache.nifi/nifi-parquet-processors/pom.xml
inflated: META-INF/maven/org.apache.nifi/nifi-parquet-processors/pom.properties

It does seem like it is possible that going through all of this would allow for an additional record reader to be made available in the ConvertRecord processor, but to be honest with you, since there is no documentation on this, it still remains unclear to me that creating a custom distribution will solve the issue.

avatar
Master Mentor

@arutkwccu 
Yes, the added ParquetReader will be available to the record processors that utilize record reader or writer.
There is not a lot involved in this process.
Simply download these two nar files and place them in the NiFi extensions folder owned by NiFi service user. NiFi takes care of the rest.
I added these two nars to my 2.0.0-M1 installation to show you that it works here:

MattWho_0-1704288440866.png

MattWho_1-1704288655890.png

If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped.

Thank you,
Matt



avatar
Contributor

You are very correct, that it works. And your efforts to simplify the process are much appreciated. It is perhaps more difficult when launching it from a Docker Compose file. I wound up making the following modifications:
volumes:
- /opt/nifi-custom-setup/version_2-0-0/add_nars:/opt/nifi/nifi-current/add_nars # Add additional nars
- /mnt/scripts:/opt/nifi/nifi-current/custom_scripts # Add Bind mount for `custom_entrypoint.sh`

Then I wrote a custom script to handle things:

#!/bin/bash

# Copy the custom NARs to the NiFi lib directory
cp /opt/nifi/nifi-current/add_nars/*.nar /opt/nifi/nifi-current/lib/

# Call the default NiFi startup script
exec /opt/nifi/nifi-current/bin/nifi.sh run

In the end, I very much wish the documentation would provide some alerts to the end users on how to proceed. Again, I'm very appreciative of the assist.


avatar
Master Mentor

@arutkwccu 
The release notes for minor releases typically include highlights covering any "deprecated" components (This is documented for Apache NiFi 2.0.0-M1 Deprecated Components and Features ) or components moved to "optional build profiles" (such as this specific parquet bundle).  This is done help make minor upgrades as seamless and painless as possible.

Apache NIFi 2.0.0 is a Major release and as such will have many significant changes including breaking changes as well.  As such, it should not be treated the same as minor version upgrade and extra care and evaluation taken during migration from a previous major release version.   I agree that is would have been nice if the Apache Community included another confluence page documenting all components to the "Optional Build Profile" with instructions like i added above on how to add them.

Glad you are good to go now.

Thank you,
Matt

avatar
Master Mentor

@arutkwccu 
The Apache NiFi 2.0.0-M1 release notes have now been updated with a list of nars that have been moved to the Optional Build Profiles.  

https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version2.0.0-M1

Thank you,
Matt