Member since
07-16-2018
6
Posts
1
Kudos Received
0
Solutions
09-19-2018
05:49 PM
1 Kudo
Hi, We are planning to use NiFi's Data Provenance on long term to use it for audits. What is the best way to configure NiFi ? If I "simply" backup the Data Provenance disk content, will I be able to use it later? By reinjecting it in a working NiFi? Should I use ReportingTaskProcessor? But again how do you query backuped data later? By keeping it in a dedicated NiFi used only for backup? I also did not understand the management of the FlowFile contents, is it supposed to be stored in the DataProvenance Disk (which would greatly increase its size...)? Or is the "replay button" from Data Provenance UI working only if the content is still fresh and present in the "Content repository"? Or is Data Provenance just not meant to used for long term purpose? Sorry if mess up multiple concepts. Thanks.
... View more
Labels:
- Labels:
-
Apache NiFi
07-26-2018
08:15 AM
Thanks Matt for the clear answer !
... View more
07-25-2018
03:05 PM
Thanks for your answer, but as I understand it : FetchParquet will get the .parquet file and put its content in the flowFile, but it won't help to export it as .csv. The flowFile content will still be the binary parquet version of the data. I plan to do the equivalent of fetchParquet with a REST call to WebHDFS.
... View more
07-25-2018
02:38 PM
Hi, I am developping a Nifi WebService to export dataLake content (stored as .parquet) as .csv. I managed to do it using HiveQL Processor but I want to do it without Hive. What I imagined was : - get the .parquet file with WebHDFS (invokeHTTP call from nifi) - use a nifi processor to convert the .parquet file to .csv Is there a nifi Processor doing that? The only option I found for now is to use a spark job, which sounds a bit complicated for this purpose. Thanks.
... View more
Labels:
- Labels:
-
Apache NiFi
-
Apache Spark