Member since
10-03-2016
32
Posts
7
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3635 | 08-27-2018 08:38 AM | |
7610 | 05-04-2018 01:14 PM | |
2131 | 07-24-2017 12:01 PM | |
4818 | 07-09-2017 06:02 PM |
12-04-2018
12:57 PM
Anpan K, Atlas requires 3 services to work properly. 1. Kafka 2. Solr 3. HBase Kafka is required so that changes done in the Hive metadata can be captured and a lineage can be created to be show in the ATLAS UI. Solr is used to index that Atlas Data so that we can search the data in Atlas UI. It has three collections which make search happen. ( Full text index, edge index and vertex index) HBase is used to store the actual data which is coming in the Atlas, In HDP 3 Janus graph has been launched while in earlier version it was Titan graph storage.
... View more
08-28-2018
09:33 AM
@Cibi Chakaravarthi thanks for an update.
... View more
08-27-2018
10:13 AM
@Naveen Nain As per the description, it seems that you are trying you configure Nifi Atlas Integration and you have given the User name as Admin in the nifi configs which does not seems to be present in the Atlas. So, could you please check are you able to login into Atlas UI using the username and password mentioned in the nifi configs. For more info you may refer to -- https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.2/bk_installing-hdf-and-hdp/content/nifi-atlas.html
... View more
08-27-2018
08:38 AM
1 Kudo
@Cibi Chakaravarthi , Could you please set the atlas.kafka.security.protocol in atlas configs as PLAINTEXTSASL and see if it helps.
... View more
06-01-2018
01:38 PM
If Hive is running on postgresql then : To get tables.txt from postgresql run below command:- psql -d hive -c "SELECT \"NAME\", \"TBL_NAME\" FROM \"DBS\" as a, \"TBLS\" as b where a.\"DB_ID\"=b.\"DB_ID\";" > /tmp/tables1.txt
Then to make tables1.txt compatible with the python script run below command. awk '{print $1" " $3}' tables1.txt >> tables.txt Now open the file tables.txt and delete the first line which should be something like "----". Now press escape key on keyboard and type below command to get it ready to be used by the findmissingtablesinatlas.txt file Please note to get ^Icharacter you need to use the tab key. :%s/ /^I/g
... View more
05-04-2018
01:14 PM
2 Kudos
To backup Atlas you can backup Hbase table, follow below steps: 1. Create a folder in HDFS which is having an owner as HBase. 2. Run below command from HBase user with TGT (if required) to export HBase table into HDFS folder which is newly created. # hbase org.apache.hadoop.hbase.mapreduce.Export "atlas_titan" "/<folder>/atlas_titan"
# hbase org.apache.hadoop.hbase.mapreduce.Export "ATLAS_ENTITY_AUDIT_EVENTS" "/<folder>/ATLAS_ENTITY_AUDIT_EVENTS" Above commands will backup the Data from HBase table into HDFS. Please note snapshot only creates a snap of the HBase table so that the original table can be restored to the snapshot point. Also, the snapshot does not replicate the data it just checkpoints it. With that being said, at the time of import / restore, you should have the table created with a correct schema which can be done either by doing a restart of Atlas or you can use manual commands from HBase shell to create HBase tables and then restore the HBase table:- 1. Run below command from the HBase user with TGT if required to import HBase table from HDFS folder to HBase table: # hbase org.apache.hadoop.hbase.mapreduce.Import 'atlas_titan' '/<folder>/atlas_titan'
# hbase org.apache.hadoop.hbase.mapreduce.Import 'ATLAS_ENTITY_AUDIT_EVENTS' '/<folder>/ATLAS_ENTITY_AUDIT_EVENTS' You need to restart atlas once the import is done. Manual command to create HBase table schema for Atlas :- create 'atlas_titan' , {NAME => 'e', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} ,{NAME => 'g', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} , {NAME => 'i', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} , {NAME => 'l', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} , {NAME => 'm', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} , {NAME => 's', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} create 'ATLAS_ENTITY_AUDIT_EVENTS' , {NAME => 'dt', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
... View more
03-16-2018
03:38 AM
@Laura Ngo - the API shared by @Madhan Neethiraj is the correct one which you can use. However, in addition to it, I would like to inform you that if you have more than 10000 results and want to scroll between them you can use offset option like below query. curl -k -u admim:admin -H "Content-type:application/json"-X GET https://url:port/api/atlas/v2/earch/dsl?limit=10000&offset=20000&query=hive_column%20where%20__state%3D%27ACTIVE%27%20and%20qualifiedName%20like%20%27prod_%2A_data_lake%2A%27%20selct%20qualifiedName%2Cname%2C__guid | python -m json.tool > hive_column_prod_data_lake.json
... View more
02-01-2018
12:11 PM
3 Kudos
Please note 'apllication.properties' file is present in '/etc/atlas/conf/' folder so you need to rename it to 'atlas-application.properties' with permissions of 744, owner as atlas and group as hadoop.
... View more
07-24-2017
12:01 PM
@Arsalan Siddiqi Please try with below link https://github.com/hortonworks/data-tutorials/tree/archive-hdp-2.5/tutorials/hdp/hdp-2.5/cross-component-lineage-with-apache-atlas-across-apache-sqoop-hive-kafka-storm/assets
... View more
07-24-2017
12:00 PM
@Arsalan Siddiqi Please try with below link
https://github.com/hortonworks/data-tutorials/tree/archive-hdp-2.5/tutorials/hdp/hdp-2.5/cross-component-lineage-with-apache-atlas-across-apache-sqoop-hive-kafka-storm/assets
... View more