Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to get history data of Twitter/ Facebook from Flume or any other sources ?

How to get history data of Twitter/ Facebook from Flume or any other sources ?

New Contributor

Is it possible to have history data from Twitter or Facebook.

For example If I want past tweets on IPhone then is it possible to get those ??

3 REPLIES 3

Re: How to get history data of Twitter/ Facebook from Flume or any other sources ?

Mentor

Thats a paid service from Twitter firehose, here's a popular thread discussing this https://twittercommunity.com/t/how-do-i-get-firehose-access/7490/8

Re: How to get history data of Twitter/ Facebook from Flume or any other sources ?

Expert Contributor

Another option beside the Firehose, is a web-scrapping. Is prohibited by twitter, but lots of people doing that.

With this approach you will receive a bit limited amount of tweet properties, also it uses Twitter Web Search, which actually returns only "indexed" tweets. So, you will never recieve all tweets for particular keyword.

Also is possible to find a companies who can sell you the historical tweets for particular topic. Not sure about the price..

As for Facebook - there is no any limitation. You can get any info publicly available thru the Facebook API. The only limitation is a facebook servers availability. Since some very old posts and comments are resides on the slow nodes and might be unavailable sometimes.

Re: How to get history data of Twitter/ Facebook from Flume or any other sources ?

Guru

Rahul,

You can do 2 things here:

1. If the website offers a REST endpoint such as Twitter and Facebook, you can directly connect to that and get the data.

2. You can write custom web crawler scripts in Python or JavaScript or Perl and download and parse webpages for the information you are looking for.

For Step 2, also look at JQuery and PyQuery.

--Vedant