I need to know what pattern you follow in your hadoop datalake once some records are hard deleted from the source itself.
Problem which i am facing as of now - Source hard deletes or archives some of the data from its systems.
if we are ingesting same data source, and if we don't manage hard deletes, reports made on the basis of data lake is quite different as compare to reports made on the source directly. Business does not want this ambiguity.
Please advise Best practise/approach to manage this in datalake.