Created 03-27-2017 06:25 AM
Is there anyway to take backup of Atlas and restore it.
Created 05-04-2018 01:14 PM
To backup Atlas you can backup Hbase table, follow below steps:
1. Create a folder in HDFS which is having an owner as HBase.
2. Run below command from HBase user with TGT (if required) to export HBase table into HDFS folder which is newly created.
# hbase org.apache.hadoop.hbase.mapreduce.Export "atlas_titan" "/<folder>/atlas_titan" # hbase org.apache.hadoop.hbase.mapreduce.Export "ATLAS_ENTITY_AUDIT_EVENTS" "/<folder>/ATLAS_ENTITY_AUDIT_EVENTS"
Above commands will backup the Data from HBase table into HDFS.
Please note snapshot only creates a snap of the HBase table so that the original table can be restored to the snapshot point. Also, the snapshot does not replicate the data it just checkpoints it.
With that being said, at the time of import / restore, you should have the table created with a correct schema which can be done either by doing a restart of Atlas or you can use manual commands from HBase shell to create HBase tables and then restore the HBase table:-
1. Run below command from the HBase user with TGT if required to import HBase table from HDFS folder to HBase table:
# hbase org.apache.hadoop.hbase.mapreduce.Import 'atlas_titan' '/<folder>/atlas_titan' # hbase org.apache.hadoop.hbase.mapreduce.Import 'ATLAS_ENTITY_AUDIT_EVENTS' '/<folder>/ATLAS_ENTITY_AUDIT_EVENTS'
You need to restart atlas once the import is done.
Manual command to create HBase table schema for Atlas :-
create 'atlas_titan' , {NAME => 'e', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} ,{NAME => 'g', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} , {NAME => 'i', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} , {NAME => 'l', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} , {NAME => 'm', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} , {NAME => 's', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
create 'ATLAS_ENTITY_AUDIT_EVENTS' , {NAME => 'dt', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
Created 03-27-2017 07:48 AM
What is hdp version or apache-atlas version you are using ?
Created 03-27-2017 07:49 AM
Am using atlas 0.8 version
Created 03-27-2017 06:01 PM
The changes are present in 0.8-incubator. Hope I am looking at the right branch: https://github.com/apache/incubator-atlas/commits/0.8-incubating
Created 03-27-2017 12:03 PM
@Karthik K Atlas backend store is actually hbase and so taking a hbase table snapshot is equivalent taking backup of atlas metadata. Please note, I have not tried this yet. In theory, this should work, so I would recommend to try this on sandbox environment to see if everything is restored after the hbase snapshot is imported.
Created 03-31-2017 07:18 AM
In addition to HBase tables, Atlas data is stored in 3 Solr collections as well: vertex_index, edge_index and fulltext_index.These need to be backed up as well.
Created 11-23-2018 10:36 PM
@Madhan Neethiraj What's the best practice to build Atlas DR Site?
Created 03-27-2017 04:59 PM
@Karthik K: Atlas now has export and import REST APIs. We are hoping to update documentation by end of this week.
The current implementation needs the user to be admin to be able to use these APIs. The implementation can be found in AdminResource.
Here are CURL calls show export and import of DB generated from QuickStart_v1:
Export
curl -X POST -u admin:admin -H "Cache-Control: no-cache" -H "Postman-Token: 4a22172c-0dcc-5608-7bb2-e7bdfd61615a" -d '{ "itemsToExport": [ { "typeName": "DB_v1", "uniqueAttributes": { "name": "Sales" } } ], "options": { "fetchType": "full" } } ' "http://localhost:21000/api/atlas/admin/export" > Sales_v1-Full.zip
Import
curl -X POST -H "Content-type: application/octet-stream" -u admin:admin -H "Cache-Control: no-cache" --data-binary @../docs/Sales_v1-Full.zip "http://localhost:21000/api/atlas/admin/import"
Similar calls are possible with well known types like hive_db, hdfs_path, etc.
Created 03-31-2017 07:31 AM
Export/import feature helps to copy Atlas data from one instance to another. However, it won't replace the need for backup.
Created 05-04-2018 01:14 PM
To backup Atlas you can backup Hbase table, follow below steps:
1. Create a folder in HDFS which is having an owner as HBase.
2. Run below command from HBase user with TGT (if required) to export HBase table into HDFS folder which is newly created.
# hbase org.apache.hadoop.hbase.mapreduce.Export "atlas_titan" "/<folder>/atlas_titan" # hbase org.apache.hadoop.hbase.mapreduce.Export "ATLAS_ENTITY_AUDIT_EVENTS" "/<folder>/ATLAS_ENTITY_AUDIT_EVENTS"
Above commands will backup the Data from HBase table into HDFS.
Please note snapshot only creates a snap of the HBase table so that the original table can be restored to the snapshot point. Also, the snapshot does not replicate the data it just checkpoints it.
With that being said, at the time of import / restore, you should have the table created with a correct schema which can be done either by doing a restart of Atlas or you can use manual commands from HBase shell to create HBase tables and then restore the HBase table:-
1. Run below command from the HBase user with TGT if required to import HBase table from HDFS folder to HBase table:
# hbase org.apache.hadoop.hbase.mapreduce.Import 'atlas_titan' '/<folder>/atlas_titan' # hbase org.apache.hadoop.hbase.mapreduce.Import 'ATLAS_ENTITY_AUDIT_EVENTS' '/<folder>/ATLAS_ENTITY_AUDIT_EVENTS'
You need to restart atlas once the import is done.
Manual command to create HBase table schema for Atlas :-
create 'atlas_titan' , {NAME => 'e', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} ,{NAME => 'g', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} , {NAME => 'i', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} , {NAME => 'l', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} , {NAME => 'm', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} , {NAME => 's', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
create 'ATLAS_ENTITY_AUDIT_EVENTS' , {NAME => 'dt', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}