Created 10-14-2025 11:58 PM
I am using NIFI to read data from FTP and push it to HDFS. My business process is to schedule workflows for subsequent computing tasks through dophinschedule. One detail here is that my workflow will execute Alter for corresponding analysis before running.
I have noticed that the Spark task occasionally encounters a FileNotFound error during runtime, which may cause the task to fail. It is speculated that this is because partition information has already been added. At this time, when the task is running, NIFI is still writing data to the corresponding partition, and the file being written will cause this error. How to optimize this problem?
Created 10-15-2025 02:37 AM
@Meepoljd Spark will read the metadata of the table and get the list of files to be read and if there is continuous changes ( delete/overwrite) on the table and on the execution of read operation, the spark job can fail with FNF exception.
The option here would be to minimise the change duration and run the job when there are no changes
OR catch the exception within the spark code and rebuild the dataframe.
Created 10-21-2025 01:16 AM
@Meepoljd, Did the response assist in resolving your query? If it did, kindly mark the relevant reply as the solution, as it will aid others in locating the answer more easily in the future.
Regards,
Vidya Sargur,