Created 01-02-2024 08:17 AM
Good Morning,
I'm flummoxed and worried. I have a significant flow that handles parquet files, and I can no longer make it work in version 2.0. Specifically, I used a ConvertRecord processor with a ParquetReader and a JsonRecordSetWriter.
In the new version, 2.0, There doesn't appear to be a ParquetReader, anymore. There also doesn't appear to be any documentation about the ParquetReader being deprecated.
Please help. This flow is critical to me. I am also open to suggestions on how to accomplish the same conversion in a different manner.
Created 01-02-2024 12:21 PM
@arutkwccu
The parquet components were not "deprecated", they were moved out of the default distribution via https://issues.apache.org/jira/browse/NIFI-12282
Apache limits the max size of the project and at times some nars need to be moved out of the distribution to avoid exceeding that max allowed project download size or components less commonly used may be moved out. Does not mean that these components ate no longer being contributed to or updated with the newer NiFi release versions. It does mean however that you will need to manually download the nifi-parquet-nar and any dependency nar(s) it needs that are not already included in your NiFi distribution.
You can download nars directly from maven central.
Here is the link to the maven central for the nifi-parquet-nar:
https://central.sonatype.com/artifact/org.apache.nifi/nifi-parquet-nar/overview
If you look at the "Dependencies" tab, you'll see that there is a dependecy on another nar (nifi-hadoop-libraries-nar).
Neither of these nars are inlcuded in the default Apache NiFi 2.0.-M1 release.
The simplest way to add these nars to your NiFi is to download them into the <path-toNiFi>/extensions/" folder. NiFi will auto-load nars placed on this folder without any need for a NiFi restart.
You can find the nars by clicking on the "versions" tab for each required nar and clciking on "Browse" next to the NiFi version you want the nar for.
Here are the direct links for the two nars you need for parquet:
nifi-parquet-nar - https://repo1.maven.org/maven2/org/apache/nifi/nifi-parquet-nar/2.0.0-M1/nifi-parquet-nar-2.0.0-M1.n...
nifi-hadoop-libraries-nar - https://repo1.maven.org/maven2/org/apache/nifi/nifi-hadoop-libraries-nar/2.0.0-M1/nifi-hadoop-librar...
If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created on 01-03-2024 05:29 AM - edited 01-03-2024 05:31 AM
@arutkwccu
Yes, the added ParquetReader will be available to the record processors that utilize record reader or writer.
There is not a lot involved in this process.
Simply download these two nar files and place them in the NiFi extensions folder owned by NiFi service user. NiFi takes care of the rest.
I added these two nars to my 2.0.0-M1 installation to show you that it works here:
If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created 01-02-2024 11:31 AM
I don't see any parquet NAR files in my NiFi 2.0.0-M1 install or in the Docker image.
Created 01-02-2024 12:21 PM
@arutkwccu
The parquet components were not "deprecated", they were moved out of the default distribution via https://issues.apache.org/jira/browse/NIFI-12282
Apache limits the max size of the project and at times some nars need to be moved out of the distribution to avoid exceeding that max allowed project download size or components less commonly used may be moved out. Does not mean that these components ate no longer being contributed to or updated with the newer NiFi release versions. It does mean however that you will need to manually download the nifi-parquet-nar and any dependency nar(s) it needs that are not already included in your NiFi distribution.
You can download nars directly from maven central.
Here is the link to the maven central for the nifi-parquet-nar:
https://central.sonatype.com/artifact/org.apache.nifi/nifi-parquet-nar/overview
If you look at the "Dependencies" tab, you'll see that there is a dependecy on another nar (nifi-hadoop-libraries-nar).
Neither of these nars are inlcuded in the default Apache NiFi 2.0.-M1 release.
The simplest way to add these nars to your NiFi is to download them into the <path-toNiFi>/extensions/" folder. NiFi will auto-load nars placed on this folder without any need for a NiFi restart.
You can find the nars by clicking on the "versions" tab for each required nar and clciking on "Browse" next to the NiFi version you want the nar for.
Here are the direct links for the two nars you need for parquet:
nifi-parquet-nar - https://repo1.maven.org/maven2/org/apache/nifi/nifi-parquet-nar/2.0.0-M1/nifi-parquet-nar-2.0.0-M1.n...
nifi-hadoop-libraries-nar - https://repo1.maven.org/maven2/org/apache/nifi/nifi-hadoop-libraries-nar/2.0.0-M1/nifi-hadoop-librar...
If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created 01-02-2024 12:55 PM
I'm very appreciative of the quick reply. I guess what I don't have clarity on is how this affects the ConvertRecord processor. Are you advising that by installing these nars, that will actually cause a change to the available record readers that are present in the ConvertRecord processor? It's unclear that this is the case by following the link you referenced:
https://central.sonatype.com/artifact/org.apache.nifi/nifi-parquet-nar/overview
I guess I don't see anything in the documentation that references this information, and a review of the hyperlink doesn't actually provide any information about what is made available by installing the nars.
I went to the trouble of downloading the nar from this link and expanding it
https://repo1.maven.org/maven2/org/apache/nifi/nifi-parquet-nar/2.0.0-M1/nifi-parquet-nar-2.0.0-M1.n...
I see the following:
f nifi-parquet-processors-2.0.0-M1.jar
created: META-INF/
inflated: META-INF/MANIFEST.MF
created: META-INF/services/
created: org/
created: org/apache/
created: org/apache/nifi/
created: org/apache/nifi/parquet/
created: org/apache/nifi/parquet/hadoop/
created: org/apache/nifi/parquet/record/
created: org/apache/nifi/parquet/stream/
created: org/apache/nifi/parquet/utils/
created: org/apache/nifi/processors/
created: org/apache/nifi/processors/parquet/
created: META-INF/maven/
created: META-INF/maven/org.apache.nifi/
created: META-INF/maven/org.apache.nifi/nifi-parquet-processors/
inflated: META-INF/DEPENDENCIES
inflated: META-INF/LICENSE
inflated: META-INF/NOTICE
inflated: META-INF/services/org.apache.nifi.controller.ControllerService
inflated: META-INF/services/org.apache.nifi.processor.Processor
inflated: org/apache/nifi/parquet/ParquetReader.class
inflated: org/apache/nifi/parquet/ParquetRecordSetWriter.class
inflated: org/apache/nifi/parquet/hadoop/AvroParquetHDFSRecordReader.class
inflated: org/apache/nifi/parquet/hadoop/AvroParquetHDFSRecordWriter.class
inflated: org/apache/nifi/parquet/record/ParquetRecordReader.class
inflated: org/apache/nifi/parquet/record/WriteParquetResult.class
inflated: org/apache/nifi/parquet/stream/NifiOutputStream.class
inflated: org/apache/nifi/parquet/stream/NifiParquetInputFile.class
inflated: org/apache/nifi/parquet/stream/NifiParquetOutputFile.class
inflated: org/apache/nifi/parquet/stream/NifiSeekableInputStream.class
inflated: org/apache/nifi/parquet/utils/ParquetConfig.class
inflated: org/apache/nifi/parquet/utils/ParquetUtils.class
inflated: org/apache/nifi/processors/parquet/ConvertAvroToParquet.class
inflated: org/apache/nifi/processors/parquet/FetchParquet.class
inflated: org/apache/nifi/processors/parquet/PutParquet.class
inflated: META-INF/maven/org.apache.nifi/nifi-parquet-processors/pom.xml
inflated: META-INF/maven/org.apache.nifi/nifi-parquet-processors/pom.properties
It does seem like it is possible that going through all of this would allow for an additional record reader to be made available in the ConvertRecord processor, but to be honest with you, since there is no documentation on this, it still remains unclear to me that creating a custom distribution will solve the issue.
Created on 01-03-2024 05:29 AM - edited 01-03-2024 05:31 AM
@arutkwccu
Yes, the added ParquetReader will be available to the record processors that utilize record reader or writer.
There is not a lot involved in this process.
Simply download these two nar files and place them in the NiFi extensions folder owned by NiFi service user. NiFi takes care of the rest.
I added these two nars to my 2.0.0-M1 installation to show you that it works here:
If you found any of the suggestions/solutions provided helped you with your issue, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created 01-03-2024 05:35 AM
You are very correct, that it works. And your efforts to simplify the process are much appreciated. It is perhaps more difficult when launching it from a Docker Compose file. I wound up making the following modifications:
volumes:
- /opt/nifi-custom-setup/version_2-0-0/add_nars:/opt/nifi/nifi-current/add_nars # Add additional nars
- /mnt/scripts:/opt/nifi/nifi-current/custom_scripts # Add Bind mount for `custom_entrypoint.sh`
Then I wrote a custom script to handle things:
#!/bin/bash
# Copy the custom NARs to the NiFi lib directory
cp /opt/nifi/nifi-current/add_nars/*.nar /opt/nifi/nifi-current/lib/
# Call the default NiFi startup script
exec /opt/nifi/nifi-current/bin/nifi.sh run
In the end, I very much wish the documentation would provide some alerts to the end users on how to proceed. Again, I'm very appreciative of the assist.
Created 01-03-2024 06:01 AM
@arutkwccu
The release notes for minor releases typically include highlights covering any "deprecated" components (This is documented for Apache NiFi 2.0.0-M1 Deprecated Components and Features ) or components moved to "optional build profiles" (such as this specific parquet bundle). This is done help make minor upgrades as seamless and painless as possible.
Apache NIFi 2.0.0 is a Major release and as such will have many significant changes including breaking changes as well. As such, it should not be treated the same as minor version upgrade and extra care and evaluation taken during migration from a previous major release version. I agree that is would have been nice if the Apache Community included another confluence page documenting all components to the "Optional Build Profile" with instructions like i added above on how to add them.
Glad you are good to go now.
Thank you,
Matt
Created 01-04-2024 07:26 AM
@arutkwccu
The Apache NiFi 2.0.0-M1 release notes have now been updated with a list of nars that have been moved to the Optional Build Profiles.
https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version2.0.0-M1
Thank you,
Matt