Member since
07-30-2019
93
Posts
96
Kudos Received
2
Solutions
10-04-2016
03:52 PM
@bhagan You seem to have an Hbase region service not running. The hanging search is usually an indication that Hbase region server is down. Go to Ambari and restart the region service and see if that helps.
... View more
10-04-2016
03:50 PM
@Floris Smit In the sandbox it should work by default. Can you check in the Hive View through Ambari and make sure Hive has data? Also you may need to verify that your Ranger Tag Sync service and also your HBase region server is up and running. In the Sandbox Maintainance mode is turned on for and you won't see a red warning when the service is down.
... View more
09-29-2016
03:35 PM
@Dennis Connolly Great idea. Here is a link to an article about the taxonomy features. https://community.hortonworks.com/articles/58932/understanding-taxonomy-in-apache-atlas.html
... View more
09-29-2016
01:59 AM
7 Kudos
One key feature in Apache Atlas is the ability to track data lineage in your Data Lake visually. This allows you to very quickly understand the lifecycle of your data and answer questions about where the data originated from and how it relates to other data in the Data Lake. To Illustrate this we will use our own twitter data to perform sentiment analytics on our tweets in Hive and see how this is reflected in Apache Atlas. By now you should have a working sandbox or HDP environment up and running with Atlas enabled. If not please take a look at the following tutorial to help get you started: Getting Started with Atlas in HDP 2.5 First we need to gather the data sets we will use in this tutorial. Log into twitter Click on your account settings at the top right Select "Your Twitter Data" from the list on the left side of your screen. Now enter your password and in the "Other Data" section at the bottom select "Twitter Archive" This might take a little time but you will get a link to download your archive soon. While you wait on that data lets quickly grab the sentiment library we will use in this tutorial. Here is the zip you will need to download. AFINN Data In it we will need the AFINN-111.txt file. Now that we have the data let's go to the Hive View through Ambari and click the "Upload Table" link. Now just navigate to the tweets.csv file that is located in your twitter archive. ***If you need a twitter dataset to use for this step I have made mine public here : DataSample You will need to click the small gear icon next to the file type to specify the header row exists. Now upload the table and repeat the steps for the AFINN-111.txt file. Name it sentiment_dictionary to work with the rest of this tutorial. Make the column names "word" and "rating" Now that we have the required data let's perform some transformations in Hive. Back in Ambari in your Hive View open up a new query window. The sentiment analysis has been adapted from the article here : https://acadgild.com/blog/sentiment-analysis-on-tweets-with-apache-hive-using-afinn-dictionary/ to fit this tutorials dataset. Create a table to store the words in our tweet text as an array: CREATE TABLE words_array AS SELECT tweet_id AS id, split(text,' ') AS words FROM tweets; Create a table that explodes the array into individual words: CREATE TABLE tweet_word AS SELECT id AS id, word FROM words_array LATERAL VIEW explode(words) w as word; Now JOIN the sentiment_dictionary to the tweet_word table CREATE TABLE word_join AS SELECT tweet_word.id, tweet_word.word, sentiment_dictionary.rating FROM tweet_word LEFT OUTER JOIN sentiment_dictionary ON (tweet_word.word=sentiment_dictionary.word); Great! Now we have each word rated for sentiment. The range is from -5 to +5. From here what you decide to do with this data is a topic for a different article; however, now that we have created the word_join table we can jump back to Atlas to inspect the lineage information associated to our new dataset. In the Atlas UI search for word_join Notice the connections to all parent tables and the recording of the actual SQL statements we executed during the transformations.
... View more
Labels:
09-29-2016
01:59 AM
9 Kudos
Why should you care about Taxonomy? Taxonomy is the missing link in information management projects. By organizing all corporate data, the usability of knowledge increases considerably! In this tutorial we will explore how Apache Atlas enables you to apply a Taxonomy to the data in you HDP data lake. Log into Apache Atlas. If you don't see a screen that looks like this (With Taxonomy available) You need to enable Taxonomy. To enable Taxonomy Go to Ambari and click on Atlas in the Services menu. Then select Configs and Advanced: Scroll to the bottom and in the Custom application-properties add a property atlas.feature.taxonomy.enable and set to true. Now restart Atlas. You should now see Taxonomy in the Atlas UI. In the Taxonomy tab click on the "..." elipse by Catalog and select "Create Sub-Term". I will create a term for Tagging Twitter related Assets used in the previous tutorials. You are free to create your own terms how you see fit or follow along. Click the Catalog drop down and select the new Term you created. Then click the "..." again and select "create new subterm". I will make this one "Data" Now that there are available terms go into the search tab and search for "tweets" This brings up all the tables used in the previous tutorial on tags for hive. Click "Add Term" on the table "tweets" and from the drop down select "Catalog.Twitter.Data". Do this to all twitter data sets/tables in the search results. Now the terms should be displayed next to the assets. Go back to the Taxonomy tab and drill down to Catalog.Twitter.Data and click the "..." this time select "Search Assets" You will see ALL the assets with this term. If you click on the table name you will see the terms displayed on the summary page.
... View more
Labels:
09-21-2016
04:29 PM
5 Kudos
Once you have Atlas up and running( see this for getting started. ) you will want to create your first tag and use it to tag data to begin exploring what Atlas can do. Lets start by creating a tag that will be called "PII" which we will later use it to tag Personally Identifiable data. Log into Atlas and select the + icon on the homepage. Enter "PII" for the tag name and click create. That's it! Now we have a tag to use. Now click on the search tab select "dsl" and from the drop down select "hive_table" hit enter and select the "customer" table. You should see the summary details for the customer table in hive. Select the schema tab. Here you see all columns available in the customer data table. Let's mark the "account_num" field as "PII". Next to "account_num" click the + icon and select "PII" from the drop down. Now the column has been tagged and both searchable from Atlas as well as configured to be administered through Hive for auditing and permissions. Click the "Tags" tab in Atlas and search for "PII" to see your field show up.
... View more
Labels:
09-21-2016
02:00 PM
Are you on the 2.5 sandbox? If so when you search in atlas are you in the "search" tab? If you search hive in the "tags" tab it will not find anything.
... View more
09-19-2016
11:14 PM
5 Kudos
If you haven't already done so first download the Sandbox from the Hortonworks Website and import it to Virtualbox/VMware. Once you start the Virtual Machine you are directed to the landing page for your new sandbox. http://localhost:8888/ Here you have 2 Paths to choose from. If you are completely new to HDP I encourage you to browse the "New to HDP" section. Click on Advanced HDP to get to the links for the components we will be using in this tutorial. If you look at the Atlas link you see the Note: Off by default. Enable by logging in as raj_ops into Ambari You will need to start in Ambari as user ... raj_ops Go ahead and click the link for Ambari and login. If you see a page that looks like this you are doing great! Atlas has dependencies on other Apache projects. We need to make sure HDFS, Hbase, Kafka and Ambari Infra are all started. Click on each service and then under the service actions drop down select start. After verifying all components are up and running go ahead and follow the same process for Atlas. You can now use the Quick links to navigate to the Atlas Home Page : Atlas Login using the credentials provided on the Quick Links page. You can now search for artifacts and begin exploring Atlas. Try running a search for "hive" and selecting the "customer" table result. You get all the facts about that data source like the Schema and lineage seen below. Hope this was helpful. If there are any Atlas features you would like to see articles for let me know and I will use that feedback to drive future post topics.
... View more
08-22-2016
04:47 PM
It looks like flowFile.get() is in a method in a class All that script does is define a class and exit. Gotta move the indents so the flowFile.get() block is outside the class definition (so no indent at all). That might fix the issue.
... View more
08-17-2016
07:28 PM
That is strange. I see that you loop the failures back into the script processor. Can you delete that and route failure and success to the PutFile processor?
... View more
- « Previous
-
- 1
- 2
- Next »