Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

SparkStreaming doesn't detect file with the same name again

Highlighted

SparkStreaming doesn't detect file with the same name again

New Contributor

I run Spark Streaming to fetch all files from directory in batches.

If the a file with the same name were copied to the directory - that Sparks look for - and its modified date is within the window of the batch streaming, Spark ignores it.

Any other new files with different name in the directory, and its modified date within the window of the batch streaming, Spark fetch normally

This contradicts with the documentation were spark state they look for modification date only.

Spark Streamin - textFileStream

 

Anyone has explaination?

 

 

Don't have an account?
Coming from Hortonworks? Activate your account here