Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

ETL WEBSITE CONTENT IN HADOOP SANDBOX

avatar

I am very very new to Hadoop Sandbox . I have installed HDP Sandbox on oracle Virtualbox and Putty since last week and im taking these tutorials : https://hortonworks.com/hadoop-tutorial/hello-world-an-introduction-to-hadoop-hcatalog-hive-and-pig/ Can anyone tell me any tutorial or suggestions how can I get a website content step by step, or facebook content , extract it and analyze then it (ETL)?! Thanks !

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@voca voca

An example is the tutorial below:

https://hortonworks.com/hadoop-tutorial/loading-data-into-the-hortonworks-sandbox/

A bit more adventurous would be to ingest twitter data using N-Fi, visualizing via Solr/Banana, and then doing some Query processing using Hive:

https://hortonworks.com/hadoop-tutorial/how-to-refine-and-visualize-sentiment-data/

Full list of tutorials:

https://hortonworks.com/tutorials/

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

@voca voca

An example is the tutorial below:

https://hortonworks.com/hadoop-tutorial/loading-data-into-the-hortonworks-sandbox/

A bit more adventurous would be to ingest twitter data using N-Fi, visualizing via Solr/Banana, and then doing some Query processing using Hive:

https://hortonworks.com/hadoop-tutorial/how-to-refine-and-visualize-sentiment-data/

Full list of tutorials:

https://hortonworks.com/tutorials/

avatar
Master Mentor

@voca voca

For social media content like Facebook you can take a look at :

Analyzing Social Media and Customer Sentiment With Apache NiFi and HDP Search: https://hortonworks.com/hadoop-tutorial/how-to-refine-and-visualize-sentiment-data/