Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to take backup of Apache Atlas and restore it.

How to take backup of Apache Atlas and restore it.

New Contributor

Is there anyway to take backup of Atlas and restore it.

9 REPLIES 9

Re: How to take backup of Apache Atlas and restore it.

Expert Contributor

@Karthik K,

What is hdp version or apache-atlas version you are using ?

Re: How to take backup of Apache Atlas and restore it.

New Contributor

Am using atlas 0.8 version

Re: How to take backup of Apache Atlas and restore it.

Expert Contributor

The changes are present in 0.8-incubator. Hope I am looking at the right branch: https://github.com/apache/incubator-atlas/commits/0.8-incubating

Highlighted

Re: How to take backup of Apache Atlas and restore it.

@Karthik K Atlas backend store is actually hbase and so taking a hbase table snapshot is equivalent taking backup of atlas metadata. Please note, I have not tried this yet. In theory, this should work, so I would recommend to try this on sandbox environment to see if everything is restored after the hbase snapshot is imported.

Re: How to take backup of Apache Atlas and restore it.

Contributor

In addition to HBase tables, Atlas data is stored in 3 Solr collections as well: vertex_index, edge_index and fulltext_index.These need to be backed up as well.

Re: How to take backup of Apache Atlas and restore it.

New Contributor

@Madhan Neethiraj What's the best practice to build Atlas DR Site?

Re: How to take backup of Apache Atlas and restore it.

Expert Contributor

@Karthik K: Atlas now has export and import REST APIs. We are hoping to update documentation by end of this week.

The current implementation needs the user to be admin to be able to use these APIs. The implementation can be found in AdminResource.

Here are CURL calls show export and import of DB generated from QuickStart_v1:

Export

curl -X POST -u admin:admin -H "Cache-Control: no-cache" -H "Postman-Token: 4a22172c-0dcc-5608-7bb2-e7bdfd61615a" -d '{
  "itemsToExport": [
      {
          "typeName": "DB_v1",
          "uniqueAttributes": {
              "name": "Sales"
          }
    }
  ],
  "options": { "fetchType": "full"   }
}
' "http://localhost:21000/api/atlas/admin/export" > Sales_v1-Full.zip

Import

curl -X POST -H "Content-type: application/octet-stream" -u admin:admin -H "Cache-Control: no-cache" --data-binary @../docs/Sales_v1-Full.zip  "http://localhost:21000/api/atlas/admin/import"

Similar calls are possible with well known types like hive_db, hdfs_path, etc.

Re: How to take backup of Apache Atlas and restore it.

Contributor

Export/import feature helps to copy Atlas data from one instance to another. However, it won't replace the need for backup.

Re: How to take backup of Apache Atlas and restore it.

Cloudera Employee

To backup Atlas you can backup Hbase table, follow below steps:

1. Create a folder in HDFS which is having an owner as HBase.

2. Run below command from HBase user with TGT (if required) to export HBase table into HDFS folder which is newly created.

# hbase org.apache.hadoop.hbase.mapreduce.Export "atlas_titan" "/<folder>/atlas_titan" 
# hbase org.apache.hadoop.hbase.mapreduce.Export "ATLAS_ENTITY_AUDIT_EVENTS" "/<folder>/ATLAS_ENTITY_AUDIT_EVENTS" 

Above commands will backup the Data from HBase table into HDFS.

Please note snapshot only creates a snap of the HBase table so that the original table can be restored to the snapshot point. Also, the snapshot does not replicate the data it just checkpoints it.

With that being said, at the time of import / restore, you should have the table created with a correct schema which can be done either by doing a restart of Atlas or you can use manual commands from HBase shell to create HBase tables and then restore the HBase table:-

1. Run below command from the HBase user with TGT if required to import HBase table from HDFS folder to HBase table:

# hbase org.apache.hadoop.hbase.mapreduce.Import 'atlas_titan' '/<folder>/atlas_titan' 
# hbase org.apache.hadoop.hbase.mapreduce.Import 'ATLAS_ENTITY_AUDIT_EVENTS' '/<folder>/ATLAS_ENTITY_AUDIT_EVENTS' 

You need to restart atlas once the import is done.

Manual command to create HBase table schema for Atlas :-

create 'atlas_titan' , {NAME => 'e', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} ,{NAME => 'g', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} , {NAME => 'i', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} , {NAME => 'l', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} , {NAME => 'm', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} , {NAME => 's', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 
create 'ATLAS_ENTITY_AUDIT_EVENTS' , {NAME => 'dt', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
Don't have an account?
Coming from Hortonworks? Activate your account here