About vnv

vnv · ‎02-16-2017

To begin log into ambari and from the Views section select workflow manager. Now select create new workflow. Give your workflow a name and then click on the line that connects the start and end nodes to add a step. Select the email step to add it to your flow and then select it and click the settings(gear) icon. Add your custom settings for all required fields and click save. Now we have a workflow capable of sending an email. That was easy and no XML needed to be modified(big reason many people have never used Oozie) Click Submit and provide a path (that doesn't already exist) for the workflow to be saved. Now go to the Dashboard and find your submitted workflow. You can click run from the dashboard to run the flow or you can select "run on submit" in the step before saving and submitting the flow.

vnv · ‎02-10-2017

When moving to HDP what is the recommended way to transition existing taxonomy into Atlas?

vnv · ‎10-04-2016

@bhagan You seem to have an Hbase region service not running. The hanging search is usually an indication that Hbase region server is down. Go to Ambari and restart the region service and see if that helps.

vnv · ‎10-04-2016

@Floris Smit In the sandbox it should work by default. Can you check in the Hive View through Ambari and make sure Hive has data? Also you may need to verify that your Ranger Tag Sync service and also your HBase region server is up and running. In the Sandbox Maintainance mode is turned on for and you won't see a red warning when the service is down.

vnv · ‎09-29-2016

@Dennis Connolly Great idea. Here is a link to an article about the taxonomy features. https://community.hortonworks.com/articles/58932/understanding-taxonomy-in-apache-atlas.html

vnv · ‎09-29-2016

One key feature in Apache Atlas is the ability to track data lineage in your Data Lake visually. This allows you to very quickly understand the lifecycle of your data and answer questions about where the data originated from and how it relates to other data in the Data Lake. To Illustrate this we will use our own twitter data to perform sentiment analytics on our tweets in Hive and see how this is reflected in Apache Atlas. By now you should have a working sandbox or HDP environment up and running with Atlas enabled. If not please take a look at the following tutorial to help get you started: Getting Started with Atlas in HDP 2.5 First we need to gather the data sets we will use in this tutorial. Log into twitter Click on your account settings at the top right Select "Your Twitter Data" from the list on the left side of your screen. Now enter your password and in the "Other Data" section at the bottom select "Twitter Archive" This might take a little time but you will get a link to download your archive soon. While you wait on that data lets quickly grab the sentiment library we will use in this tutorial. Here is the zip you will need to download. AFINN Data In it we will need the AFINN-111.txt file. Now that we have the data let's go to the Hive View through Ambari and click the "Upload Table" link. Now just navigate to the tweets.csv file that is located in your twitter archive. ***If you need a twitter dataset to use for this step I have made mine public here : DataSample You will need to click the small gear icon next to the file type to specify the header row exists. Now upload the table and repeat the steps for the AFINN-111.txt file. Name it sentiment_dictionary to work with the rest of this tutorial. Make the column names "word" and "rating" Now that we have the required data let's perform some transformations in Hive. Back in Ambari in your Hive View open up a new query window. The sentiment analysis has been adapted from the article here : https://acadgild.com/blog/sentiment-analysis-on-tweets-with-apache-hive-using-afinn-dictionary/ to fit this tutorials dataset. Create a table to store the words in our tweet text as an array: CREATE TABLE words_array AS SELECT tweet_id AS id, split(text,' ') AS words FROM tweets; Create a table that explodes the array into individual words: CREATE TABLE tweet_word AS SELECT id AS id, word FROM words_array LATERAL VIEW explode(words) w as word; Now JOIN the sentiment_dictionary to the tweet_word table CREATE TABLE word_join AS SELECT tweet_word.id, tweet_word.word, sentiment_dictionary.rating FROM tweet_word LEFT OUTER JOIN sentiment_dictionary ON (tweet_word.word=sentiment_dictionary.word); Great! Now we have each word rated for sentiment. The range is from -5 to +5. From here what you decide to do with this data is a topic for a different article; however, now that we have created the word_join table we can jump back to Atlas to inspect the lineage information associated to our new dataset. In the Atlas UI search for word_join Notice the connections to all parent tables and the recording of the actual SQL statements we executed during the transformations.

vnv · ‎09-29-2016

Why should you care about Taxonomy? Taxonomy is the missing link in information management projects. By organizing all corporate data, the usability of knowledge increases considerably! In this tutorial we will explore how Apache Atlas enables you to apply a Taxonomy to the data in you HDP data lake. Log into Apache Atlas. If you don't see a screen that looks like this (With Taxonomy available) You need to enable Taxonomy. To enable Taxonomy Go to Ambari and click on Atlas in the Services menu. Then select Configs and Advanced: Scroll to the bottom and in the Custom application-properties add a property atlas.feature.taxonomy.enable and set to true. Now restart Atlas. You should now see Taxonomy in the Atlas UI. In the Taxonomy tab click on the "..." elipse by Catalog and select "Create Sub-Term". I will create a term for Tagging Twitter related Assets used in the previous tutorials. You are free to create your own terms how you see fit or follow along. Click the Catalog drop down and select the new Term you created. Then click the "..." again and select "create new subterm". I will make this one "Data" Now that there are available terms go into the search tab and search for "tweets" This brings up all the tables used in the previous tutorial on tags for hive. Click "Add Term" on the table "tweets" and from the drop down select "Catalog.Twitter.Data". Do this to all twitter data sets/tables in the search results. Now the terms should be displayed next to the assets. Go back to the Taxonomy tab and drill down to Catalog.Twitter.Data and click the "..." this time select "Search Assets" You will see ALL the assets with this term. If you click on the table name you will see the terms displayed on the summary page.

vnv · ‎09-28-2016

Thanks! That worked. Good to know about the TP status.

vnv · ‎09-28-2016

I don't have a "Taxonomy" tab in my Atlas UI of HDP 2.5 sandbox. Is there a way to enable this? I only have "TAGS" and "SEARCH"

vnv · ‎09-28-2016

I would imagine this is because the external table and view are not hive_tables. Try using CREATE TABLE instead if you want it to register as a "hive_table" otherwise it will show up in the lineage like you see in the screenshot you posted. If you want to associate them all under the same key words you can leverage "TAGS" to tag the tables with a common tag (ex. "AIRLINE_DATA"). The DSL search will only return the exact data type you select. Hope this helps.

Online	Offline
Last Visited	‎02-05-2020 03:01 AM

Member Since	‎07-30-2019 09:17 AM
Last Visited	‎02-05-2020 03:01 AM
Posts	93
Kudos received	78

Cloudera Community

Re: Solr cores not being created due to error in X...

Re: How to install and configure a minimum system ...

Email Notifications in Apache Oozie WFM View

What is the best way to import existing taxonomy t...

Re: How to get Atlas up and running in HDP 2.5 San...

Re: How to get Atlas up and running in HDP 2.5 San...

Re: How to get Atlas up and running in HDP 2.5 San...

Hive data lineage using Apache Atlas

Understanding Taxonomy in Apache Atlas

Re: Why is Taxonomy not showing in Atlas UI?

Why is Taxonomy not showing in Atlas UI?

Re: Atlas not refreshing with hive metdata