About rbansal

rbansal · ‎12-04-2018

Anpan K, Atlas requires 3 services to work properly. 1. Kafka 2. Solr 3. HBase Kafka is required so that changes done in the Hive metadata can be captured and a lineage can be created to be show in the ATLAS UI. Solr is used to index that Atlas Data so that we can search the data in Atlas UI. It has three collections which make search happen. ( Full text index, edge index and vertex index) HBase is used to store the actual data which is coming in the Atlas, In HDP 3 Janus graph has been launched while in earlier version it was Titan graph storage.

rbansal · ‎08-28-2018

@Cibi Chakaravarthi thanks for an update.

rbansal · ‎08-27-2018

@Naveen Nain As per the description, it seems that you are trying you configure Nifi Atlas Integration and you have given the User name as Admin in the nifi configs which does not seems to be present in the Atlas. So, could you please check are you able to login into Atlas UI using the username and password mentioned in the nifi configs. For more info you may refer to -- https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.2/bk_installing-hdf-and-hdp/content/nifi-atlas.html

rbansal · ‎08-27-2018

@Cibi Chakaravarthi , Could you please set the atlas.kafka.security.protocol in atlas configs as PLAINTEXTSASL and see if it helps.

rbansal · ‎06-01-2018

If Hive is running on postgresql then : To get tables.txt from postgresql run below command:- psql -d hive -c "SELECT \"NAME\", \"TBL_NAME\" FROM \"DBS\" as a, \"TBLS\" as b where a.\"DB_ID\"=b.\"DB_ID\";" > /tmp/tables1.txt Then to make tables1.txt compatible with the python script run below command. awk '{print $1" " $3}' tables1.txt >> tables.txt Now open the file tables.txt and delete the first line which should be something like "----". Now press escape key on keyboard and type below command to get it ready to be used by the findmissingtablesinatlas.txt file Please note to get ^Icharacter you need to use the tab key. :%s/ /^I/g

rbansal · ‎05-04-2018

To backup Atlas you can backup Hbase table, follow below steps: 1. Create a folder in HDFS which is having an owner as HBase. 2. Run below command from HBase user with TGT (if required) to export HBase table into HDFS folder which is newly created. # hbase org.apache.hadoop.hbase.mapreduce.Export "atlas_titan" "/<folder>/atlas_titan" # hbase org.apache.hadoop.hbase.mapreduce.Export "ATLAS_ENTITY_AUDIT_EVENTS" "/<folder>/ATLAS_ENTITY_AUDIT_EVENTS" Above commands will backup the Data from HBase table into HDFS. Please note snapshot only creates a snap of the HBase table so that the original table can be restored to the snapshot point. Also, the snapshot does not replicate the data it just checkpoints it. With that being said, at the time of import / restore, you should have the table created with a correct schema which can be done either by doing a restart of Atlas or you can use manual commands from HBase shell to create HBase tables and then restore the HBase table:- 1. Run below command from the HBase user with TGT if required to import HBase table from HDFS folder to HBase table: # hbase org.apache.hadoop.hbase.mapreduce.Import 'atlas_titan' '/<folder>/atlas_titan' # hbase org.apache.hadoop.hbase.mapreduce.Import 'ATLAS_ENTITY_AUDIT_EVENTS' '/<folder>/ATLAS_ENTITY_AUDIT_EVENTS' You need to restart atlas once the import is done. Manual command to create HBase table schema for Atlas :- create 'atlas_titan' , {NAME => 'e', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} ,{NAME => 'g', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} , {NAME => 'i', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} , {NAME => 'l', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} , {NAME => 'm', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} , {NAME => 's', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} create 'ATLAS_ENTITY_AUDIT_EVENTS' , {NAME => 'dt', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => '2592000', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}

rbansal · ‎03-16-2018

@Laura Ngo - the API shared by @Madhan Neethiraj is the correct one which you can use. However, in addition to it, I would like to inform you that if you have more than 10000 results and want to scroll between them you can use offset option like below query. curl -k -u admim:admin -H "Content-type:application/json"-X GET https://url:port/api/atlas/v2/earch/dsl?limit=10000&offset=20000&query=hive_column%20where%20__state%3D%27ACTIVE%27%20and%20qualifiedName%20like%20%27prod_%2A_data_lake%2A%27%20selct%20qualifiedName%2Cname%2C__guid | python -m json.tool > hive_column_prod_data_lake.json

rbansal · ‎02-01-2018

Please note 'apllication.properties' file is present in '/etc/atlas/conf/' folder so you need to rename it to 'atlas-application.properties' with permissions of 744, owner as atlas and group as hadoop.

rbansal · ‎07-24-2017

@Arsalan Siddiqi Please try with below link https://github.com/hortonworks/data-tutorials/tree/archive-hdp-2.5/tutorials/hdp/hdp-2.5/cross-component-lineage-with-apache-atlas-across-apache-sqoop-hive-kafka-storm/assets

rbansal · ‎07-24-2017

@Arsalan Siddiqi Please try with below link https://github.com/hortonworks/data-tutorials/tree/archive-hdp-2.5/tutorials/hdp/hdp-2.5/cross-component-lineage-with-apache-atlas-across-apache-sqoop-hive-kafka-storm/assets

Online	Offline
Last Visited	‎03-18-2025 09:36 AM

Member Since	‎10-03-2016 09:34 PM
Last Visited	‎03-18-2025 09:36 AM
Posts	32
Kudos received	7

Cloudera Community

Re: Hive Atlas Hook - Kafka Producer properties

Re: How to take backup of Apache Atlas and restore...

Re: cross component lineage scripts missing

Re: telnet connection refused

Re: solr and atlas

Re: Hive Atlas Hook - Kafka Producer properties

Re: Apache atlas data lineage issue

Re: Hive Atlas Hook - Kafka Producer properties

Re: Identifying missing Table entries in Atlas

Re: How to take backup of Apache Atlas and restore...

Re: Is there a way of returning more than 100 resu...

Re: Unable to import hive with atlas

Re: cross component lineage scripts missing

Re: Where to get Started with Atlas?