I am working on a design and implementation of Hadoop Multi-tenant solution with multiple storage tiering and different retention policy per tenant. I was wondering is there any reference architecture/best practice which has been already provided? Suppose we want to have 6 months retention policy on hot storage for the tenant 1 and 12 months retention policy on hot storage for the tenant 2. Is there any tool have been already provided or I have to consider it via Oozie job to move data from one tier to another tier based on the tenant retention policy?
Apache Falcon does support retention policies for HDFS/HIVE. It does purge after the retention time has reached. Internally uses Oozie/MR to do this.