Atlas provides powerful
Tagging capabilities which Data Analysts to identify all data sets containing
specific types of data. The Atlas UI
itself provides a powerful Tag based search capability which require no REST
API interaction. However, for those of
you out there who need to integrate Tag based search with some of their data
discovery and governance activities, this posting is for you. Within this posting are some instructions
regarding how you can use the Atlas REST API to retrieve entity data based on a
Before getting too deep into the Atlas Tag search examples it is important to recognize that Atlas Tags are basically a form of an Atlas type. If you invoke the REST API command “/api/atlas/types”, in the summary output below interspersed between standard Atlas types such as ‘hive_table’, ‘jms_topic’, etc., will be the current set of user defined Atlas Tags (CUSTOMER & SALES) as shown below:
In the rest of the article we will expand on the Atlas types API to explore how we can perform two different types of TAG based searches. Before going too far it is important to note that the source code for the following examples are available through this repo.
Tag Search Example #1: Simple REST based Tag Based Search example
In our first Tag search example our objective is to return a
list of Atlas Data Entities which have the query TAG name assigned. In this example, we are going to search our atlas
instance on (‘server1’ port 21000) for all Atlas entities with a tag named
CUSTOMER. You will want to replace
CUSTOMER with an existing tag on your system.
Our Atlas DSL query to find the CUSTOMER tag using the ‘curl’
command is as shown below:
The example above returns a list of the entity guids which
have the Atlas Tag ‘CUSTOMER’ defined to the Atlas host ‘server1’ on port
21000. To run this query on your own
cluster or on a sandbox just substitute the Atlas Server Host URL, Atlas Server
Port number, login information and your Tag name and then invoke as shown above
with curl (or SimpleAtlasTagSearch.py in the Python example in the referenced
Repo at the end of this article).
An output from this REST API query on my cluster is shown
The results from this query can be thought of having 3
results header where you can find the results
Results (list of entity guids)
For our purposes we are really only interested in the list
of entities, so all you need to do is focus on extracting the important
information from the .results jsonpath object in the return json object. Looking at the results section we observe
that only one entity has the CUSTOMER tag assigned. This entity located by the search has the
guid assigned of ‘4138c963-b20d-4d10-b338-2c334202af43’ we see is an active
entity (not deleted). We can now use the
entity search capabilities to retrieve the actual entity as described in the
next example within this article.
Example #2: Returning details on all entities based on Tag assignment
The beauty of Example #1 is we can build an entity list
using a single REST API call. However,
for the real world we will want access to details about the assigned
entities. To accomplish this, we will
need a programming interface such as Python, Java, Scala, bash what your
favorite tool is, etc. to pull the GUIDs and then perform entity searches.
For the purposes of this posting, we will use Python to
illustrate how to perform more powerful Atlas Tag searches. The example below performs two Atlas REST API
queries to build a json object containing the details and not just guids for
the entities with our Tag assigned.