We put time lapse for spark streaming for every 5 min, while streaming starting if file is in COPYING stage getting exception like follows. How to handle this situation.
Cause on Env:- Copying log file from LOCAL(PC) to HDFS(Streaming area) using PUT command.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 9.0 failed 1 times, most recent failure: Lost task 3.0 in stage 9.0 Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /Stream/ip/W3C_10000K.txt._COPYING_ Caused by: java.io.FileNotFoundException: File does not exist: /Stream/ip/W3C_10000K.txt._COPYING_
Please see spark streaming documenation for File Streams. You should first put the file to a temporary location and then use the move command (hdfs -mv source target) to atomically move the file to the directory.