When an object gets created in Hive metastore, we expect to see the entry in Atlas almost instantaneously. Often times, might be because of disruption in availability of some of the core components on which Atlas is dependent on or the methodology that was used sometimes to create metastore objects(ex: Directly use metastore API to create Hive objects) where Atlas hook doesn't get invoked, we some times see missing tables in Atlas.
This article aims at identifying the missing table entries in Atlas. Once the missing table entries are identified, we can post them to Atlas. The easiest method of doing that will be explained in a later article(Will update the reference link in this article when I am done writing the second article).
Step#1: Identify the Hive tables from the metastore.
There are multiple ways of gathering this information. The easiest way is to write an SQL over "hive" database to get all the information. In my example, the metastore is based on MySQL. From command line on Hive metastore
mysql -u root -e "use hive;SELECT NAME, TBL_NAME FROM DBS as a, TBLS as b where a.DB_ID=b.DB_ID;" > tables.txt
sed -i '1d' tables.txt
(The above command will get rid of the heading)
Step#2: Download the attached findMissingTablesInAtlas script to your host into your home directory or any directory of your choice. Make sure the tables.txt file with the table names is present in the same directory as well.
Step#2.1: Make sure you have the comma separated user id and password in you ~/.cred file. The user id should have nesessary previliges to post to Atlas. if you are missing those, you can grant them using Ranger policy.
$ cat ~/.cred
<my user id>,<my password>
Step#3: Edit the script and make changes as per your environment. Change the atlasServer and clusterName values. The script will by default read the tablex.txt from the same location as in python script.
Step#4: Thats it! Run the script using the below command:
Step#5: After the script execution was successful, you should see the below two files in the same directory:
In the next posting, instructions for posting these tables to Atlas will be provided.