Support Questions

Find answers, ask questions, and share your expertise

Use Nifi PutHDFS and SparkSQL report file not found

avatar
Rising Star

I am using NIFI to read data from FTP and push it to HDFS. My business process is to schedule workflows for subsequent computing tasks through dophinschedule. One detail here is that my workflow will execute Alter for corresponding analysis before running.
I have noticed that the Spark task occasionally encounters a FileNotFound error during runtime, which may cause the task to fail. It is speculated that this is because partition information has already been added. At this time, when the task is running, NIFI is still writing data to the corresponding partition, and the file being written will cause this error. How to optimize this problem?

Meepoljd_0-1760511333028.png

 

2 REPLIES 2

avatar
Expert Contributor

@Meepoljd Spark will read the metadata of the table and get the list of files to be read and if there is continuous changes ( delete/overwrite)  on the table and on the execution of read operation, the spark job can fail with FNF exception.

The option here would be to minimise the change duration and run the job when there are no changes 
OR catch the exception within the spark code and rebuild the dataframe. 

avatar
Community Manager

@Meepoljd, Did the response assist in resolving your query? If it did, kindly mark the relevant reply as the solution, as it will aid others in locating the answer more easily in the future.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: