Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

S3A HDFS Sink loosing data

Highlighted

S3A HDFS Sink loosing data

Explorer
I've got an HDFS Sink pointed to S3 using the s3a filesystem using
flume 1.5.  It mostly works.  Occasionally, I'll see a
FileNotFoundException when it attempts to open the tmp s3a output
file.  If I look further back in the logs, I notice several
HostNotFoundExceptions which looks like it's in a retry loop of some
sort.

One curious thing is that do also see previous to this an
"IOException:  Callable timed out...".  I notice that happens on the
close of the BucketWriter. Reading into it a bit, I notice that the
tmp file appears to be deleted in a finally block in the
S3AOutputStream, which would mean this the original
FileNotFoundException is somewhat expected.   Now, obviously I can
increase the timeout but ultimately I loose data in this scenario
which makes me think I'm doing something wrong or there's a bug
somewhere.

Has anyone else noticed this or have some insights on this?

Thanks,
--tim
Don't have an account?
Coming from Hortonworks? Activate your account here