Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to handle COPYING file streaming in Spark streaming

How to handle COPYING file streaming in Spark streaming

New Contributor

We put time lapse for spark streaming for every 5 min, while streaming starting if file is in COPYING stage getting exception like follows. How to handle this situation.

Cause on Env:- Copying log file from LOCAL(PC) to HDFS(Streaming area) using PUT command.

org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 9.0 failed 1 times, most recent failure: Lost task 3.0 in stage 9.0 Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /Stream/ip/W3C_10000K.txt._COPYING_ Caused by: java.io.FileNotFoundException: File does not exist: /Stream/ip/W3C_10000K.txt._COPYING_

1 REPLY 1

Re: How to handle COPYING file streaming in Spark streaming

Expert Contributor

Please see spark streaming documenation for File Streams[1].  You should first put the file to a temporary location and then use the move command (hdfs -mv source target) to atomically move the file to the directory.

 

1.  http://spark.apache.org/docs/1.6.0/streaming-programming-guide.html#basic-sources