In my cluster, I'm using Hive Replication to S3 to backup databases on daily-basis.
I was referring to the documentation and couldn't find anything on whether any database/table/file in user's personal directory is also deleted from S3 if it is dropped from the cluster.
From the documentation,
Can anyone please confirm if the above point is applicable when replicating to S3?
What approach is taken / recommended to keep the cluster and backup on S3 in-sync?
Thank you for the reply.
I understand that dropping table / database in the cluster doesn't replicate it back to the cloud back-up.
We have a usecase where each month our internal customers create some tables / databases, work on them for few days and then drop them once they are done. As a result of this, S3 bucket has many abandoned databases which is growing day-by-day.
It would be really helpful if you could advise a way to keep the S3 bucket in-sync with the Hive.