Support Questions

Find answers, ask questions, and share your expertise

NiFi file transfer report

avatar
Rising Star

hi 

How do we get a report for all the file transfers for a given period ?

We are using Nifi to only transfer file from one location to another using SFTP and no transformation on file data.

Would like to generate a report with the transferred file name and its transfer status.

Thanks

 

 

1 ACCEPTED SOLUTION

avatar
Master Mentor

@nifier 
3  weeks old posting but I still hope it help resolve your reporting task.NiFi has a built-in Data Provenance feature that tracks the lineage of data as it moves through the flow.
To capture file transfer details:

1. Enable Provenance Reporting in NiFi

  • Provenance Events: NiFi records events such as SEND, RECEIVE, DROP, and ROUTE. For SFTP file transfers, look for SEND events.

  • Steps to Enable Provenance Reporting:

    1. Log in to the NiFi UI.
    2. Go to the Provenance tab (accessible from the top menu).
    3. Configure the Provenance Repository in nifi.properties to store sufficient
    4. Spoiler
      nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository
      nifi.provenance.repository.max.storage.time=30 days nifi.provenance.repository.max.storage.size=1 GB
    5. Ensure the max.storage.time or max.storage.size is configured to retain events for the desired reporting period.

2. Query Provenance Data

You can query and filter provenance events to generate the report:

  • Go to the Provenance tab in the NiFi UI.
  • Filter the events using criteria:
    • Component Name: Filter for the SFTP processor (PutSFTP).
    • Event Type: Select SEND.
    • Date Range: Specify the desired time frame.
  • Download the filtered results as a CSV file.

3. Automate Reporting with NiFi Reporting Tasks

To generate periodic reports automatically:

  • Use the SiteToSiteProvenanceReportingTask:
    1. In the NiFi canvas, navigate to Controller Settings.
    2. Add a new Reporting Task and select SiteToSiteProvenanceReportingTask.
    3. Configure the Reporting Task to:
      • Specify the target location for the report.
      • Filter for SEND events related to your file transfer processors.
    4. Schedule the Reporting Task to run periodically (e.g., daily or weekly).

4. Include File Name and Transfer Status

NiFi provenance events include metadata such as the file name, size, and transfer status:

  • File Name: Captured in the filename attribute.
  • Transfer Status:
    • Success: The event is logged with SEND.
    • Failure: Look for errors or failed processor logs (use a LogMessage processor to capture failure events in flow).

5. Alternative: Push Logs to External Tools

You can push the provenance data to an external system for detailed analysis and reporting:

  • Elasticsearch/Kibana: Use the PutElasticsearch processor to send provenance events to Elasticsearch and visualize them in Kibana.
  • Custom Script: Use the ExecuteScript processor to write a Python or Groovy script to extract, filter, and format the provenance data into a report.

6. Sample Workflow for Reporting

  1. Use a QueryProvenance processor to fetch provenance events for the desired period.
  2. Filter for SEND events from the SFTP processor.
  3. Route successful and failed events to different processors (PutFile for saving logs).
  4. Format the report (CSV/JSON) using processors like UpdateAttribute and ConvertRecord.

By combining these steps, you can efficiently generate a report for all file transfers in the given period, including file names and transfer statuses.

View solution in original post

4 REPLIES 4

avatar
Master Mentor

@nifier 
3  weeks old posting but I still hope it help resolve your reporting task.NiFi has a built-in Data Provenance feature that tracks the lineage of data as it moves through the flow.
To capture file transfer details:

1. Enable Provenance Reporting in NiFi

  • Provenance Events: NiFi records events such as SEND, RECEIVE, DROP, and ROUTE. For SFTP file transfers, look for SEND events.

  • Steps to Enable Provenance Reporting:

    1. Log in to the NiFi UI.
    2. Go to the Provenance tab (accessible from the top menu).
    3. Configure the Provenance Repository in nifi.properties to store sufficient
    4. Spoiler
      nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository
      nifi.provenance.repository.max.storage.time=30 days nifi.provenance.repository.max.storage.size=1 GB
    5. Ensure the max.storage.time or max.storage.size is configured to retain events for the desired reporting period.

2. Query Provenance Data

You can query and filter provenance events to generate the report:

  • Go to the Provenance tab in the NiFi UI.
  • Filter the events using criteria:
    • Component Name: Filter for the SFTP processor (PutSFTP).
    • Event Type: Select SEND.
    • Date Range: Specify the desired time frame.
  • Download the filtered results as a CSV file.

3. Automate Reporting with NiFi Reporting Tasks

To generate periodic reports automatically:

  • Use the SiteToSiteProvenanceReportingTask:
    1. In the NiFi canvas, navigate to Controller Settings.
    2. Add a new Reporting Task and select SiteToSiteProvenanceReportingTask.
    3. Configure the Reporting Task to:
      • Specify the target location for the report.
      • Filter for SEND events related to your file transfer processors.
    4. Schedule the Reporting Task to run periodically (e.g., daily or weekly).

4. Include File Name and Transfer Status

NiFi provenance events include metadata such as the file name, size, and transfer status:

  • File Name: Captured in the filename attribute.
  • Transfer Status:
    • Success: The event is logged with SEND.
    • Failure: Look for errors or failed processor logs (use a LogMessage processor to capture failure events in flow).

5. Alternative: Push Logs to External Tools

You can push the provenance data to an external system for detailed analysis and reporting:

  • Elasticsearch/Kibana: Use the PutElasticsearch processor to send provenance events to Elasticsearch and visualize them in Kibana.
  • Custom Script: Use the ExecuteScript processor to write a Python or Groovy script to extract, filter, and format the provenance data into a report.

6. Sample Workflow for Reporting

  1. Use a QueryProvenance processor to fetch provenance events for the desired period.
  2. Filter for SEND events from the SFTP processor.
  3. Route successful and failed events to different processors (PutFile for saving logs).
  4. Format the report (CSV/JSON) using processors like UpdateAttribute and ConvertRecord.

By combining these steps, you can efficiently generate a report for all file transfers in the given period, including file names and transfer statuses.

avatar
Rising Star

Sorry for very late reply, your response helped us a lot to set up Provenance 

Thank you so much.

avatar
Master Mentor

@nifier 
Thats good to hear. Now the onus is on you to share  the provenance setup that helped you resolve your problem.
It's priceless to share such information to grow our documentation base. If you do a good detailed write up then the moderators could help integrate that to the official Cloudera knowledgebase.
Happy hadooping 

avatar
Rising Star

@Shelton We just followed Steps 1,3 4 and 5 to generate the automated report to Elasticsearch. It was pretty straight forward.

Only things is we had to do was enable firewall in our Docker container and update Input Port's Access Policies.

Thanks