Why should you care about Taxonomy?
Taxonomy is the missing link in information management projects.
By organizing all corporate data, the usability of knowledge increases considerably!
In this tutorial we will explore how Apache Atlas enables you to apply a Taxonomy to the data in you HDP data lake.
Log into Apache Atlas.
If you don't see a screen that looks like this (With Taxonomy available) You need to enable Taxonomy.
To enable Taxonomy Go to Ambari and click on Atlas in the Services menu.
Then select Configs and Advanced:
Scroll to the bottom and in the Custom application-properties add a property atlas.feature.taxonomy.enable and set to true.
Now restart Atlas.
You should now see Taxonomy in the Atlas UI.
In the Taxonomy tab click on the "..." elipse by Catalog and select "Create Sub-Term".
I will create a term for Tagging Twitter related Assets used in the previous tutorials.
You are free to create your own terms how you see fit or follow along.
Click the Catalog drop down and select the new Term you created.
Then click the "..." again and select "create new subterm". I will make this one "Data"
Now that there are available terms go into the search tab and search for "tweets"
This brings up all the tables used in the previous tutorial on tags for hive.
Click "Add Term" on the table "tweets" and from the drop down select "Catalog.Twitter.Data".
Do this to all twitter data sets/tables in the search results.
Now the terms should be displayed next to the assets.
Go back to the Taxonomy tab and drill down to Catalog.Twitter.Data and click the "..." this time select "Search Assets"
You will see ALL the assets with this term.
If you click on the table name you will see the terms displayed on the summary page.
Nice step by step, will save a bunch of time for people
Keep in mind, Taxonomy feature is still in Tech Preview (ie. not recommended for production use) and will not be supported. Taxonomy will be production ready or GA in HDP 3.0