Reply
New Contributor
Posts: 1
Registered: ‎10-26-2016

How to handle COPYING file streaming in Spark streaming

We put time lapse for spark streaming for every 5 min, while streaming starting if file is in COPYING stage getting exception like follows. How to handle this situation.

Cause on Env:- Copying log file from LOCAL(PC) to HDFS(Streaming area) using PUT command.

org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 9.0 failed 1 times, most recent failure: Lost task 3.0 in stage 9.0 Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /Stream/ip/W3C_10000K.txt._COPYING_ Caused by: java.io.FileNotFoundException: File does not exist: /Stream/ip/W3C_10000K.txt._COPYING_

Cloudera Employee
Posts: 97
Registered: ‎05-10-2016

Re: How to handle COPYING file streaming in Spark streaming

[ Edited ]

Please see spark streaming documenation for File Streams[1].  You should first put the file to a temporary location and then use the move command (hdfs -mv source target) to atomically move the file to the directory.

 

1.  http://spark.apache.org/docs/1.6.0/streaming-programming-guide.html#basic-sources