About t_lebrun

t_lebrun · ‎09-19-2018

Hi, We are planning to use NiFi's Data Provenance on long term to use it for audits. What is the best way to configure NiFi ? If I "simply" backup the Data Provenance disk content, will I be able to use it later? By reinjecting it in a working NiFi? Should I use ReportingTaskProcessor? But again how do you query backuped data later? By keeping it in a dedicated NiFi used only for backup? I also did not understand the management of the FlowFile contents, is it supposed to be stored in the DataProvenance Disk (which would greatly increase its size...)? Or is the "replay button" from Data Provenance UI working only if the content is still fresh and present in the "Content repository"? Or is Data Provenance just not meant to used for long term purpose? Sorry if mess up multiple concepts. Thanks.

t_lebrun · ‎07-26-2018

Thanks Matt for the clear answer !

t_lebrun · ‎07-25-2018

Thanks for your answer, but as I understand it : FetchParquet will get the .parquet file and put its content in the flowFile, but it won't help to export it as .csv. The flowFile content will still be the binary parquet version of the data. I plan to do the equivalent of fetchParquet with a REST call to WebHDFS.

t_lebrun · ‎07-25-2018

Hi, I am developping a Nifi WebService to export dataLake content (stored as .parquet) as .csv. I managed to do it using HiveQL Processor but I want to do it without Hive. What I imagined was : - get the .parquet file with WebHDFS (invokeHTTP call from nifi) - use a nifi processor to convert the .parquet file to .csv Is there a nifi Processor doing that? The only option I found for now is to use a spark job, which sounds a bit complicated for this purpose. Thanks.

Online	Offline
Last Visited	‎05-29-2019 12:51 PM

Member Since	‎07-16-2018 03:20 PM
Last Visited	‎05-29-2019 12:51 PM
Posts	6
Kudos received	1

Cloudera Community

NiFi : Best practice to backup data provenance

Re: How to export parquet file to csv (without Hiv...

Re: How to export parquet file to csv (without Hiv...

How to export parquet file to csv (without Hive)