Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Master Guru

Accessing public social data from Facebook for a company's page is easy.

6127-facebooknifi1.png

Find your Facebook page, say Hortonworks.

Run http://findmyfbid.com/ and get the Page ID (289994161078999) for your Page.

Create a Facebook Application and Add a New Application.

Create a Facebook Access Token in Graph API Explorer using your Application.

Create your Facebook Graph API URL. https://graph.facebook.com/v2.7/289994161078999/tagged?access_token=ACCESSTOKENFROMFACEBOOK&limit=10.... Because we are using a Facebook App Token, we need to use HTTPS/SSL. To access an SSL site in GetHTTP, we need a SSL Service with a Trust Store.

Facebook Graph API Explorer

6128-graphapiexplorer.png

Facebook Graph API Explorer Test

6129-graphapiexplorer2.png

Add the URL to the GetHTTP Processor.

6132-facebooknifi4.png

Create a Standard SSL Context Service (Controller Service), for the Sandbox, use the Java SSL Trust Store.

Add the SSL Context Service to the GetHTTP Processor.

6131-facebooknifi3.png

6130-facebooknifi2.png

Save to HDFS (PutHDFS).

Download

[root@sandbox demo]#
hdfs dfs -cat /social/facebook1469644415053.json
{
  "data": [
  {
  "message": "Speakers of
Crunch Big Data Conference 2016\nCASEY STELLA - Principal Architect of
Hortonworks \nTalk: Data Preparation for Data Science: A Field
Guide\n\n\"Any data scientist who works with real data will tell you that
the hardest part of any data science task is the data preparation. Everything
from cleaning dirty data to understanding where your data is missing and how
your data is shaped, the care and feeding of your data is a prime task for the
working data scientist.\n\nI will describe my experiences in the field and
present an open source utility written with Apache Spark to automate some of
the necessary but insufficient things that I do every time I'm presented new data.
In particular, we'll talk about discovering missing values, values with skewed
distributions and discovering likely errors within your data.\"\n\nSee you
at Crunch Big Data Conference in 2016!\n#bigdata #dataanalytics #crunchconf
#crunch",
  "created_time":
"2016-07-22T11:28:03+0000",
  "id":
"430175213820486_609858935852112"
  },
  {
  "message": "Get up to
date on #Hadoop by checking out Hortonworks top 5 articles on the subject. Then
when you need something to monitor your Hadoop, check out Centerity (<a href="http://www.centerity.com/big-data-sap-hana/hadoop/">http://www.centerity.com/big-data-sap-hana/hadoop/</a>)",
  "created_time":
"2016-07-12T16:27:00+0000",
  "id":
"311930585656230_569713563211263"
  },
  {
  "message": "Hortonworks
| Learn how #ApacheMetron detect #bigdata #cybersecurity threat in
real-time?  SpringPeople is an Authorized
Training Partner of Hortonworks and provides hortonworks certified courses: <a href="http://bit.ly/29Ibe7G/n/n#hadoop">http://bit.ly/29Ibe7G\n\n#hadoop</a>
#DataScience",
  "created_time":
"2016-07-11T07:26:43+0000",
  "id":
"188518004538277_1136733933050008"
  },
  {
  "message": "Learn how
to protect your #data lifecycle w/ Hortonworks Data Flow & WANdisco Fusion <a href="http://bit.ly/1WO07On">http://bit.ly/1WO07On</a>",
  "created_time":
"2016-06-30T19:30:00+0000",
  "id":
"114198121933673_1176359322384209"
  },
  {
  "message": "Hortonworks
announces new MSP and ISV programmes #HadoopSummit <a href="http://bit.ly/29sXPiI">http://bit.ly/29sXPiI</a>",
  "created_time":
"2016-06-30T13:02:28+0000",
  "id":
"179830977794_10153982072807795"
  },
  {
  "message": "Breakfast
meeting at the Hadoop Summit in San Jose with Vishal Dhanuka of Hortonworks.
It's going to be a great day discussing with conference attendees how we can
work together to harness the power of big data in healthcare. #HS16SJ",
  "created_time":
"2016-06-29T15:09:43+0000",
  "id":
"1442034199422403_1596597077299447"
  },
  {
  "message": "#Data lakes
need control & safety against failure. That's where we come in <a href="http://bit.ly/1WO07On">http://bit.ly/1WO07On</a> Hortonworks",
  "created_time":
"2016-06-29T15:00:01+0000",
  "id":
"114198121933673_1176301025723372"
  },

DataFlow is available for download from Github.

13,255 Views
Comments
avatar
Explorer

I've been able to follow all of the steps but I am still confused about the SSL context. I understand why I need it, but I am not sure how to create one for my local machine so I can access the https. Do you have a guide you recommend for me to look at to create my keystore/truststore?

avatar
Master Guru

it's connected the JDK JRE used to run NIFI

such as /opt/jdk1.8.0_91/jre/lib/security/cacerts

https://docs.oracle.com/cd/E19957-01/817-3331/6miuccqo3/index.html

default password changeit

it's a JKS

SSL requires this in any Java application, it's a thing. The browser does this for you automagically.

avatar

Hi @Timothy Spann, I am getting the following error when I start the flow : failed to process session due to java.lang.IllegalArgumentException: Illegal character in query at index 85: https://graph.facebook.com/v2.11/246057448851197/tagged?access_token=. followed by my access token. Could you please help me out with this? I have been at it for a while now. Thank you.

avatar
Contributor

I'm collecting facebook data using NIFI. Using which processors ( and configurations) and how to modify the query to get more next feeds from the response( Graph API) . I'm getting the first 100 posts and after that a link to the next 100 posts how to manage a dynamic process to get contunually dataflow from facebook.