Support Questions

Find answers, ask questions, and share your expertise

ETL WEBSITE CONTENT IN HADOOP SANDBOX

avatar
Contributor

I am very very new to Hadoop Sandbox . I have installed HDP Sandbox on oracle Virtualbox and Putty since last week and im taking these tutorials : https://hortonworks.com/hadoop-tutorial/hello-world-an-introduction-to-hadoop-hcatalog-hive-and-pig/ Can anyone tell me any tutorial or suggestions how can I get a website content step by step, or facebook content , extract it and analyze then it (ETL)?! Thanks !

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@voca voca

An example is the tutorial below:

https://hortonworks.com/hadoop-tutorial/loading-data-into-the-hortonworks-sandbox/

A bit more adventurous would be to ingest twitter data using N-Fi, visualizing via Solr/Banana, and then doing some Query processing using Hive:

https://hortonworks.com/hadoop-tutorial/how-to-refine-and-visualize-sentiment-data/

Full list of tutorials:

https://hortonworks.com/tutorials/

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

@voca voca

An example is the tutorial below:

https://hortonworks.com/hadoop-tutorial/loading-data-into-the-hortonworks-sandbox/

A bit more adventurous would be to ingest twitter data using N-Fi, visualizing via Solr/Banana, and then doing some Query processing using Hive:

https://hortonworks.com/hadoop-tutorial/how-to-refine-and-visualize-sentiment-data/

Full list of tutorials:

https://hortonworks.com/tutorials/

avatar
Master Mentor

@voca voca

For social media content like Facebook you can take a look at :

Analyzing Social Media and Customer Sentiment With Apache NiFi and HDP Search: https://hortonworks.com/hadoop-tutorial/how-to-refine-and-visualize-sentiment-data/