About vnv

vnv · ‎05-17-2017

First you need to have Rapidminer downloaded and installed on your machine. https://my.rapidminer.com/nexus/account/index.html#downloads Once installed open Rapidminer and look at the list of operators. There is a link at the bottom left to "Get More Operators" Click the link then search for "Radoop" Select both packages and click install. After Rapidminer restarts you will see in the extensions folder the new operators we downloaded. Now we need to configure the connection. In the toolbar select "connections" then "Manage Radoop Connections" Select "+ New Connection". If you have your Hadoop config files available you can use those to set the properties otherwise select "manual" Select the hadoop version you have. In my case "Hortonworks 2.x" and supply the master url... If you have multiple masters select the check box and provide the details. Click "OK" Now click ">> Quick Test" If successful you are all set to read from Hive. Drag an "Radoop Nest" operator onto the canvas. Select the operator on the canvas and on the right hand side of the IDE select the connection we created earlier. Now double click the Radoop Nest operator to enter the nested canvas. Drag a "Retrieve from Hive" operator into the canvas, located in Radoop-->Data Access Click the operator and select a table that you wish to select. Connect the out port of the operator to the out port on the edge of the canvas by dragging from one to the other. Now click the Play button and wait for it to complete. Click the out port and select show sample data. Hope this was helpful! More to come on Rapidminer + Hortonworks ...

vnv · ‎02-16-2017

To begin log into ambari and from the Views section select workflow manager. Now select create new workflow. Give your workflow a name and then click on the line that connects the start and end nodes to add a step. Select the email step to add it to your flow and then select it and click the settings(gear) icon. Add your custom settings for all required fields and click save. Now we have a workflow capable of sending an email. That was easy and no XML needed to be modified(big reason many people have never used Oozie) Click Submit and provide a path (that doesn't already exist) for the workflow to be saved. Now go to the Dashboard and find your submitted workflow. You can click run from the dashboard to run the flow or you can select "run on submit" in the step before saving and submitting the flow.

mjohnson · ‎02-10-2017

There are two ways to setup an existing taxonomy into Atlas; 1.) If you have Hive, then run the import-hive.sh utility. This utility scans all of the hive tables and creates the matching Atlas entries. For 2.5.3, you will find the utility at ./2.5.3.0-37/atlas/hook-bin/import-hive.sh 2.) You can also add entries and update properties using the REST API. One article describing this is available at: https://community.hortonworks.com/content/kbentry/74064/add-custom-properties-to-existing-atlas-types-in-s.html

vnv · ‎09-29-2016

One key feature in Apache Atlas is the ability to track data lineage in your Data Lake visually. This allows you to very quickly understand the lifecycle of your data and answer questions about where the data originated from and how it relates to other data in the Data Lake. To Illustrate this we will use our own twitter data to perform sentiment analytics on our tweets in Hive and see how this is reflected in Apache Atlas. By now you should have a working sandbox or HDP environment up and running with Atlas enabled. If not please take a look at the following tutorial to help get you started: Getting Started with Atlas in HDP 2.5 First we need to gather the data sets we will use in this tutorial. Log into twitter Click on your account settings at the top right Select "Your Twitter Data" from the list on the left side of your screen. Now enter your password and in the "Other Data" section at the bottom select "Twitter Archive" This might take a little time but you will get a link to download your archive soon. While you wait on that data lets quickly grab the sentiment library we will use in this tutorial. Here is the zip you will need to download. AFINN Data In it we will need the AFINN-111.txt file. Now that we have the data let's go to the Hive View through Ambari and click the "Upload Table" link. Now just navigate to the tweets.csv file that is located in your twitter archive. ***If you need a twitter dataset to use for this step I have made mine public here : DataSample You will need to click the small gear icon next to the file type to specify the header row exists. Now upload the table and repeat the steps for the AFINN-111.txt file. Name it sentiment_dictionary to work with the rest of this tutorial. Make the column names "word" and "rating" Now that we have the required data let's perform some transformations in Hive. Back in Ambari in your Hive View open up a new query window. The sentiment analysis has been adapted from the article here : https://acadgild.com/blog/sentiment-analysis-on-tweets-with-apache-hive-using-afinn-dictionary/ to fit this tutorials dataset. Create a table to store the words in our tweet text as an array: CREATE TABLE words_array AS SELECT tweet_id AS id, split(text,' ') AS words FROM tweets; Create a table that explodes the array into individual words: CREATE TABLE tweet_word AS SELECT id AS id, word FROM words_array LATERAL VIEW explode(words) w as word; Now JOIN the sentiment_dictionary to the tweet_word table CREATE TABLE word_join AS SELECT tweet_word.id, tweet_word.word, sentiment_dictionary.rating FROM tweet_word LEFT OUTER JOIN sentiment_dictionary ON (tweet_word.word=sentiment_dictionary.word); Great! Now we have each word rated for sentiment. The range is from -5 to +5. From here what you decide to do with this data is a topic for a different article; however, now that we have created the word_join table we can jump back to Atlas to inspect the lineage information associated to our new dataset. In the Atlas UI search for word_join Notice the connections to all parent tables and the recording of the actual SQL statements we executed during the transformations.

dvillarreal · ‎04-24-2018

Keep in mind, Taxonomy feature is still in Tech Preview (ie. not recommended for production use) and will not be supported. Taxonomy will be production ready or GA in HDP 3.0

vnv · ‎09-28-2016

Thanks! That worked. Good to know about the TP status.

sunile_manjee · ‎09-29-2016

I found the issue. This is a bug. Engineering working on issue.

vnv · ‎09-23-2016

The behavior looks to indicate a VM gone bad. I am not seeing these issues after downloading a newer version of the sandbox.

shadi · ‎03-07-2017

Did anyone solve this error? I have the same error on my cluster, 10 hosts with centos 7, HDP 2.5.0.0. When I open Hive view, I get exactly the same error. I restarted ambari server, and changed the browser, but I still get it.

makenzie_kalb · ‎01-20-2017

@Vasilis Vagias I am getting the same "SocketTimeoutException: Read timed out" when running import-hive.sh. Only the "default" db entity instance is created/updated before the timeout. Were you able to find a solution to this error? Thanks.

Online	Offline
Last Visited	‎02-05-2020 03:01 AM

Member Since	‎07-30-2019 09:17 AM
Last Visited	‎02-05-2020 03:01 AM
Posts	93
Kudos received	78

Cloudera Community

Re: Solr cores not being created due to error in X...

Re: How to install and configure a minimum system ...

Read from Hive using Rapidminer

Email Notifications in Apache Oozie WFM View

Re: What is the best way to import existing taxono...

Hive data lineage using Apache Atlas

Re: Understanding Taxonomy in Apache Atlas

Re: Why is Taxonomy not showing in Atlas UI?

Re: Atlas not refreshing with hive metdata

Re: Error while importing hive metadata to Atlas o...

Re: Why is Hive View not loading

Re: Why is Atlas not tracking Hive tables created ...