Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Please see the Cloudera blog for information on the Cloudera Response to CVE-2021-4428

Fetch file does not update filename in HDF 2.1

I have just updated to HDF 2.1. In my flow, I do a GetFile and in the down stream I do a FetchFile. What I see is that after execution of FetchFile processor, the filename property retains the filename that was read by the previous processor and does not update the filename which FetchFile processor executed.

This was not the case in HDF 2.0

6 REPLIES 6

Master Guru

@bala krishnan

The GetFile processor is designed to already pull the content of the source file. You would not need to feed the output of this processor to a FetchFile processor. Did you instead mean ListFile?

The FetchFile processor was never designed to write any new attributes to a FlowFile which is where the "filename" exists. The FetchFile processor should only be retrieving the content of a target source file and inserting that content in to the FlowFile Content of the incoming FlowFile.

The FetchFile processor by default uses the attributes "absolute.path" and "filename" already present on the incoming FlowFile to retrieve the associated content. Verify the values of these attributes on your incoming FlowFiles to verify they are correct.

Matt

Master Guru

@bala krishnan

I tried to reproduce your scenario using HDF 2.0 and the listFile and FetchFile processors. The "filename" attribute following the FetchFile processor did not change from what it was prior to FetchFile.

I then templated that dataflow on HDF 2.0, loaded it into another NiFi running HDF 2.1.1 and received the same results.

I would recommend stepping a FlowFile through your dataflow examining the FlowFile attributes both before and after each processor to confirm where the "filename" attribute is getting updated.

Was your dataflow templated and moved from HDF 2.0 to HDF 2.1 or did you re-create your dataflow from scratch after you upgraded?

Thank you,

Matt

The dataflow was templated and moved from HDF 2.0 to HDF 2.1.

@Matt

Here is the complete scenario.

1. You are right, we use ListFile

2. Our requirement is, We use ListFile to extract the inbound file name and parse the name of the file. This parsed attribute is used in fetching a schema file from HDFS

3. It's after we read the schema, we read the data which was listed using FetchFile (in this case instead of using the default "absolute.path" and "filename", I would save the filename with its complete path in a attribute following the ListFile) . This FetchFile had the filename attribute updated with HDF 2.0 and its not the same with HDF 2.1(retains the filename which it read from the HDFS fro the previous step)

4. The FetchFile reads the desired content correctly. But only the filename attribute is not updated.

Can you clarify if you are talking about ListFile & FetchFile or ListHDFS & FetchHDFS?

You mentioned fetching a schema from HDFS so I am assuming you mean FetchHDFS.

There was a change in HDF 2.1 to make FetchHDFS support a compression codec:

https://issues.apache.org/jira/browse/NIFI-2963

As part of that change, FetchHDFS tries to update the filename attribute by taking path that was fetched from HDFS and taking the last element of the path, and then optionally appending an extension if a compression codec was selected.

Prior this change (HDF 2.0), FetchHDFS did not touch the filename attribute so it would be left as whatever it was before this processor.

@Matt @Bryan Bende

Thanks for your clarifications.

I am using, ListFile -> FetchHDFS -> FetchFile. Its the FetchHDFS processor that has updated the filename and that the FetchFile functionality has remained the same.

We will make a workaround to retain the filename from ListFile into FetchFile.