We did tag on Atlas for Hive tables. And we use sqoop every night to drop some hive tables and re-create them with updated data but tables metadata remains same. Atlas delete metadata as we dropped tables but the problem is Tagged data also gone even atlas hook add same metadata from re-created tables again. Is there any possibility to keep the tagged metadata even we drop tables?
Which version of Atlas are you using?
Atlas does not remove the deleted hive tables. It just marks them as deleted in the metadata. You can see the same in Atlas UI, all deleted entities are displayed in red.
If there already exists an entity in atlas with the same name and if it is marked as deleted then Atlas tries to create new entity with the same, which has its own properties.
Hi @Ayub Khan Thanks for your reply. I am using latest version. SO I can see the red marked deleted entities as historical entity. But there are a lot of duplicate entity as it got deleted and added on Atlas every night. But my question is there anyway to keep metadata in atlas even if we delete them from Hive? Or to identify them from Deleted entity and do not create again with same entity. I mean How can we avoid duplication?
The way to approach this isn't through Atlas, but rather through the way you ingest the data into Hive. Rather than dropping/deleting and recreating the Hive table you can either use "Truncate" or "Insert Overwrite" to replace the data only. This way, your metadata and tags stay intact and only your data is refreshed.
*Keep in mind that if you have ACID enabled then you will not be able to use "Insert Overwrite", and your only option is "Truncate"
As of Hive 0.14, if a table has an OutputFormat that implements AcidOutputFormat and the system is configured to use a transaction manager that implements ACID, then INSERT OVERWRITE will be disabled for that table. This is to avoid users unintentionally overwriting transaction history. The same functionality can be achieved by using TRUNCATE TABLE (for non-partitioned tables) or DROP PARTITION followed by INSERT INTO.