Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)
Guru

Why should you care about Taxonomy?

Taxonomy is the missing link in information management projects.

By organizing all corporate data, the usability of knowledge increases considerably!

In this tutorial we will explore how Apache Atlas enables you to apply a Taxonomy to the data in you HDP data lake.

Log into Apache Atlas.

If you don't see a screen that looks like this (With Taxonomy available) You need to enable Taxonomy.

8120-screen-shot-2016-09-28-at-35025-pm.png

To enable Taxonomy Go to Ambari and click on Atlas in the Services menu.

Then select Configs and Advanced:

8121-screen-shot-2016-09-28-at-35232-pm.png

Scroll to the bottom and in the Custom application-properties add a property atlas.feature.taxonomy.enable and set to true.

8122-screen-shot-2016-09-28-at-35438-pm.png

Now restart Atlas.

You should now see Taxonomy in the Atlas UI.

In the Taxonomy tab click on the "..." elipse by Catalog and select "Create Sub-Term".

8123-screen-shot-2016-09-28-at-35705-pm.png

I will create a term for Tagging Twitter related Assets used in the previous tutorials.

You are free to create your own terms how you see fit or follow along.

Click the Catalog drop down and select the new Term you created.

Then click the "..." again and select "create new subterm". I will make this one "Data"

8125-screen-shot-2016-09-28-at-40000-pm.png

Now that there are available terms go into the search tab and search for "tweets"

This brings up all the tables used in the previous tutorial on tags for hive.

Click "Add Term" on the table "tweets" and from the drop down select "Catalog.Twitter.Data".

Do this to all twitter data sets/tables in the search results.

Now the terms should be displayed next to the assets.

8127-screen-shot-2016-09-28-at-40658-pm.png

Go back to the Taxonomy tab and drill down to Catalog.Twitter.Data and click the "..." this time select "Search Assets"

You will see ALL the assets with this term.

8128-screen-shot-2016-09-28-at-41100-pm.png

If you click on the table name you will see the terms displayed on the summary page.

8129-screen-shot-2016-09-28-at-41426-pm.png

5,709 Views
Comments
Contributor

Nice step by step, will save a bunch of time for people

Keep in mind, Taxonomy feature is still in Tech Preview (ie. not recommended for production use) and will not be supported. Taxonomy will be production ready or GA in HDP 3.0

Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 09:30 AM
Updated by:
Guru vnv Guru
 
Contributors
Top Kudoed Authors