Created 05-04-2017 03:31 PM
For my usecase, I need to get metadata about file from a remote server. Don't need to fetch actual content of file. So ListSFTP processor works fine. Though getting metadata of file as mentioned in write attributes at ListSFTP , "file size" is missing attribute.
What would be easiest workaround to get file size here? Also I think a JIRA might be required to get file size as write attribute for ListSFTP processor.
Created 05-04-2017 04:22 PM
You are correct that the File size as it exists on the SFTP server is not written to an attribute on the listed FlowFile. On every FlowFile NiFi creates a FlowFile property ( fileSize ) which records the size of the Content associated to that FlowFile. This FlowFile property is not editable and is updated with the actual size of the fetched content post FetchSFTP.
I understand you don't want to actual fetch the data but only want to get some metadata (including size) about what currently exist on the SFTP server. Interesting idea. I suggest creating an Apache Jira to add an addition FLowFile Attribute on list (for example maybe "sftp.server.file.size"). We have to make sure the attribute name is very descriptive so users do not confuse it with "fileSize" that already exists on the FlowFile. We can never assume that both "fileSize" and "sftp.server.file.size" will ever be exactly the same. And "fileSize" will change depending on how the content is manipulated as it progress through a NiFi dataflow.
I see a valid use case here:
But adding this property would allow users to make routing decisions on listed files. Perhaps you don't want to Fetch any Files form the SFTp server if they are larger then XXX in size.
Thanks,
Matt
Created 05-04-2017 04:22 PM
You are correct that the File size as it exists on the SFTP server is not written to an attribute on the listed FlowFile. On every FlowFile NiFi creates a FlowFile property ( fileSize ) which records the size of the Content associated to that FlowFile. This FlowFile property is not editable and is updated with the actual size of the fetched content post FetchSFTP.
I understand you don't want to actual fetch the data but only want to get some metadata (including size) about what currently exist on the SFTP server. Interesting idea. I suggest creating an Apache Jira to add an addition FLowFile Attribute on list (for example maybe "sftp.server.file.size"). We have to make sure the attribute name is very descriptive so users do not confuse it with "fileSize" that already exists on the FlowFile. We can never assume that both "fileSize" and "sftp.server.file.size" will ever be exactly the same. And "fileSize" will change depending on how the content is manipulated as it progress through a NiFi dataflow.
I see a valid use case here:
But adding this property would allow users to make routing decisions on listed files. Perhaps you don't want to Fetch any Files form the SFTp server if they are larger then XXX in size.
Thanks,
Matt
Created 05-08-2017 05:26 PM
Was I able to address you question? Unfortunately there is no work around other then using FetchSFTP to actually pull the file content so that fileSize is updated; however, that is a complete waste of resources if you don't need to ingest the data.
If you found this response helpful, please accept the answer.
Thank you,
Matt
Created 05-09-2017 12:06 AM
Hi
@Matt Clarke
I will create JIRA for this.