Created on 07-28-2016 02:57 PM - edited 08-17-2019 11:03 AM
Accessing public social data from Facebook for a company's page is easy.
Find your Facebook page, say Hortonworks.
Run http://findmyfbid.com/ and get the Page ID (289994161078999) for your Page.
Create a Facebook Application and Add a New Application.
Create a Facebook Access Token in Graph API Explorer using your Application.
Create your Facebook Graph API URL. https://graph.facebook.com/v2.7/289994161078999/tagged?access_token=ACCESSTOKENFROMFACEBOOK&limit=10.... Because we are using a Facebook App Token, we need to use HTTPS/SSL. To access an SSL site in GetHTTP, we need a SSL Service with a Trust Store.
Facebook Graph API Explorer
Facebook Graph API Explorer Test
Add the URL to the GetHTTP Processor.
Create a Standard SSL Context Service (Controller Service), for the Sandbox, use the Java SSL Trust Store.
Add the SSL Context Service to the GetHTTP Processor.
Save to HDFS (PutHDFS).
Download
[root@sandbox demo]# hdfs dfs -cat /social/facebook1469644415053.json { "data": [ { "message": "Speakers of Crunch Big Data Conference 2016\nCASEY STELLA - Principal Architect of Hortonworks \nTalk: Data Preparation for Data Science: A Field Guide\n\n\"Any data scientist who works with real data will tell you that the hardest part of any data science task is the data preparation. Everything from cleaning dirty data to understanding where your data is missing and how your data is shaped, the care and feeding of your data is a prime task for the working data scientist.\n\nI will describe my experiences in the field and present an open source utility written with Apache Spark to automate some of the necessary but insufficient things that I do every time I'm presented new data. In particular, we'll talk about discovering missing values, values with skewed distributions and discovering likely errors within your data.\"\n\nSee you at Crunch Big Data Conference in 2016!\n#bigdata #dataanalytics #crunchconf #crunch", "created_time": "2016-07-22T11:28:03+0000", "id": "430175213820486_609858935852112" }, { "message": "Get up to date on #Hadoop by checking out Hortonworks top 5 articles on the subject. Then when you need something to monitor your Hadoop, check out Centerity (<a href="http://www.centerity.com/big-data-sap-hana/hadoop/">http://www.centerity.com/big-data-sap-hana/hadoop/</a>)", "created_time": "2016-07-12T16:27:00+0000", "id": "311930585656230_569713563211263" }, { "message": "Hortonworks | Learn how #ApacheMetron detect #bigdata #cybersecurity threat in real-time? SpringPeople is an Authorized Training Partner of Hortonworks and provides hortonworks certified courses: <a href="http://bit.ly/29Ibe7G/n/n#hadoop">http://bit.ly/29Ibe7G\n\n#hadoop</a> #DataScience", "created_time": "2016-07-11T07:26:43+0000", "id": "188518004538277_1136733933050008" }, { "message": "Learn how to protect your #data lifecycle w/ Hortonworks Data Flow & WANdisco Fusion <a href="http://bit.ly/1WO07On">http://bit.ly/1WO07On</a>", "created_time": "2016-06-30T19:30:00+0000", "id": "114198121933673_1176359322384209" }, { "message": "Hortonworks announces new MSP and ISV programmes #HadoopSummit <a href="http://bit.ly/29sXPiI">http://bit.ly/29sXPiI</a>", "created_time": "2016-06-30T13:02:28+0000", "id": "179830977794_10153982072807795" }, { "message": "Breakfast meeting at the Hadoop Summit in San Jose with Vishal Dhanuka of Hortonworks. It's going to be a great day discussing with conference attendees how we can work together to harness the power of big data in healthcare. #HS16SJ", "created_time": "2016-06-29T15:09:43+0000", "id": "1442034199422403_1596597077299447" }, { "message": "#Data lakes need control & safety against failure. That's where we come in <a href="http://bit.ly/1WO07On">http://bit.ly/1WO07On</a> Hortonworks", "created_time": "2016-06-29T15:00:01+0000", "id": "114198121933673_1176301025723372" },
DataFlow is available for download from Github.
Created on 07-29-2016 06:33 PM
I've been able to follow all of the steps but I am still confused about the SSL context. I understand why I need it, but I am not sure how to create one for my local machine so I can access the https. Do you have a guide you recommend for me to look at to create my keystore/truststore?
Created on 08-24-2016 09:40 PM
it's connected the JDK JRE used to run NIFI
such as /opt/jdk1.8.0_91/jre/lib/security/cacerts
https://docs.oracle.com/cd/E19957-01/817-3331/6miuccqo3/index.html
default password changeit
it's a JKS
SSL requires this in any Java application, it's a thing. The browser does this for you automagically.
Created on 11-14-2017 02:24 PM
Hi @Timothy Spann, I am getting the following error when I start the flow : failed to process session due to java.lang.IllegalArgumentException: Illegal character in query at index 85: https://graph.facebook.com/v2.11/246057448851197/tagged?access_token=. followed by my access token. Could you please help me out with this? I have been at it for a while now. Thank you.
Created on 05-22-2018 12:55 PM
I'm collecting facebook data using NIFI. Using which processors ( and configurations) and how to modify the query to get more next feeds from the response( Graph API) . I'm getting the first 100 posts and after that a link to the next 100 posts how to manage a dynamic process to get contunually dataflow from facebook.