--94eb2c07259c26a80d053f3b6bc2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Accessing Facebook Page Data from Apache NiFi Timothy Spann created =C2=B7 Jul 28 at 02:57 PM =C2=B7 edited =C2=B7 Jul 28 at 1= 2:06 PM 3 Short Description:Apache NiFi / HDF access to Facebook Graph Data (Public) Article Accessing public social data from Facebook for a company's page is easy. Find your Facebook page, say Hortonworks . Run http://findmyfbid.com/ and get the Page ID (*289994161078999*) for your Page. Create a Facebook Application and *= Add a New Application*. Create a Facebook Access Token in Graph API Explorer using your Application. Create your Facebook Graph API URL. https://graph.facebook.com/v2.7/289994161078999/tagged?access_token=3DACCES= STOKENFROMFACEBOOK&limit=3D100. Because we are using a Facebook App Token, we need to use HTTPS/SSL. To access an SSL site in GetHTTP, we need a SSL Service with a Trust Store. *Facebook Graph API Explorer* *Facebook Graph API Explorer Test* Add the URL to the *GetHTTP Processor.* Create a *Standard SSL Context Service* (Controller Service), for the Sandbox, use the Java SSL Trust Store. Add the SSL Context Service to the *GetHTTP Processor*. Save to HDFS (*PutHDFS*). *Download* 1. [root@sandbox demo]# 2. hdfs dfs -cat /social/facebook1469644415053.json 3. { 4. "data": [ 5. { 6. "message": "Speakers of 7. Crunch Big Data Conference 2016\nCASEY STELLA - Principal Architect o= f 8. Hortonworks \nTalk: Data Preparation for Data Science: A Field 9. Guide\n\n\"Any data scientist who works with real data will tell you = that 10. the hardest part of any data science task is the data preparation. Everything 11. from cleaning dirty data to understanding where your data is missing and how 12. your data is shaped, the care and feeding of your data is a prime task for the 13. working data scientist.\n\nI will describe my experiences in the field and 14. present an open source utility written with Apache Spark to automate some of 15. the necessary but insufficient things that I do every time I'm presented new data. 16. In particular, we'll talk about discovering missing values, values with skewed 17. distributions and discovering likely errors within your data.\"\n\nSee you 18. at Crunch Big Data Conference in 2016!\n#bigdata #dataanalytics #crunchconf 19. #crunch", 20. "created_time": 21. "2016-07-22T11:28:03+0000", 22. "id": 23. "430175213820486_609858935852112" 24. }, 25. { 26. "message": "Get up to 27. date on #Hadoop by checking out Hortonworks top 5 articles on the subject. Then 28. when you need something to monitor your Hadoop, check out Centerity (h= ttp://www.centerity.com/big-data-sap-hana/hadoop/)", 29. "created_time": 30. "2016-07-12T16:27:00+0000", 31. "id": 32. "311930585656230_569713563211263" 33. }, 34. { 35. "message": "Hortonworks 36. | Learn how #ApacheMetron detect #bigdata #cybersecurity threat in 37. real-time? SpringPeople is an Authorized 38. Training Partner of Hortonworks and provides hortonworks certified courses: http://bit.ly/29Ibe7G\n\n#hadoop<= /a> 39. #DataScience", 40. "created_time": 41. "2016-07-11T07:26:43+0000", 42. "id": 43. "188518004538277_1136733933050008" 44. }, 45. { 46. "message": "Learn how 47. to protect your #data lifecycle w/ Hortonworks Data Flow & WANdisco Fusion http://bit.ly/1WO07On", 48. "created_time": 49. "2016-06-30T19:30:00+0000", 50. "id": 51. "114198121933673_1176359322384209" 52. }, 53. { 54. "message": "Hortonworks 55. announces new MSP and ISV programmes #HadoopSummit http://bit.ly/29sXPiI", 56. "created_time": 57. "2016-06-30T13:02:28+0000", 58. "id": 59. "179830977794_10153982072807795" 60. }, 61. { 62. "message": "Breakfast 63. meeting at the Hadoop Summit in San Jose with Vishal Dhanuka of Hortonworks. 64. It's going to be a great day discussing with conference attendees how we can 65. work together to harness the power of big data in healthcare. #HS16S= J", 66. "created_time": 67. "2016-06-29T15:09:43+0000", 68. "id": 69. "1442034199422403_1596597077299447" 70. }, 71. { 72. "message": "#Data lakes 73. need control & safety against failure. That's where we come in http://bit.ly/1WO07On Hortonworks", 74. "created_time": 75. "2016-06-29T15:00:01+0000", 76. "id": 77. "114198121933673_1176301025723372" 78. }, DataFlow is available for download from Github . --94eb2c07259c26a80d053f3b6bc2 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

Accessing Facebook Page Data from Apache NiFi

Timothy Spann=C2=A0= created =C2=B7 Jul 28 at 02:57 PM =C2=B7=C2=A0edited=C2=A0=C2=B7 Jul 28 at 12:06 PM
3

Short Description:

Apache NiFi / HDF access to Facebook G= raph Data (Public)

Article

Accessing public social dat= a from Facebook for a company's page is easy.

Find your= Facebook page, say=C2=A0Hortonwork= s.

Run=C2=A0http://findmyfbid.com/=C2=A0and get the Page ID (28999416107= 8999) for your Page.

Create a= =C2=A0Facebook Application=C2= =A0and=C2=A0Add a New Application.

Create a=C2=A0Facebook Access Token=C2=A0in Graph API Explorer using your Applicat= ion.

Create your Facebook Graph API URL= .=C2=A0https://grap= h.facebook.com/v2.7/289994161078999/tagged?access_token=3DACCESSTOKENFROMFA= CEBOOK&limit=3D100. Because we are using a Facebook App Token, we n= eed to use HTTPS/SSL. To access an SSL site in GetHTTP, we need a SSL Servi= ce with a Trust Store.

Facebook= Graph API Explorer

Facebook Graph AP= I Explorer Test

Add the URL to the=C2=A0GetHTTP Processor.

Create a=C2=A0Standa= rd SSL Context Service=C2=A0(Controller Service), for the Sandbox,= use the Java SSL Trust Store.

Add the = SSL Context Service to the=C2=A0GetHTTP Processor.

Save to HDF= S (PutHDFS).

D= ownload

  1. [root@sandbox demo]= #
  2. hdfs dfs -cat /social= /facebook1469644415053.json
  3. {
  4. "data": [
  5. {
  6. "message": "Speakers of
  7. Crunch Big Data Conference 2016\nCASEY STELLA - Principal Architect o= f
  8. Hortonworks \nTalk: Data Preparation for Data Sci= ence: A Field
  9. Guide\n\n\"Any data scientist who works with real data will tell you = that
  10. the hardest part of any data science task is t= he data preparation. Everything
  11. from cleaning dirty data to understanding where your dat= a is missing and how
  12. your data is shaped, the care = and feeding of your data is a prime task for the
  13. working data scientist.\n\nI will descr= ibe my experiences in the field and
  14. present an open= source utility written with Apache Spark to automate some of
  15. the necessary but insuffic= ient things that I do every time I'm presented new data.In particular, we'll talk about discovering missing values, v= alues with skewed
  16. distributions and discovering likely errors within your data.\"\n= \nSee you
  17. at Crunch Big Data Conference in 2016!\n#= bigdata #dataanalytics #crunchconf
  18. #crunch",
  19. "created_time":
  20. "2016-07-22T11:28:03+0000&= quot;,
  21. "id":
  22. &= quot;430175213820486_609858935852112"
  23. <= span class=3D"gmail-pun" style=3D"color:rgb(102,102,0)">},
  24. {
  25. "message"<= span class=3D"gmail-pun" style=3D"color:rgb(102,102,0)">: "Get up to
  26. date on #Hadoop by checking out Hortonwork= s top 5 articles on the subject. Then
  27. when you need= something to monitor your Hadoop, check out Centerity (<a href=3D"= http://www.centerity.com/big-data-sap-hana/hadoop/<= /a>">http://www.centerity.com/big-data-sap-hana/hadoop/</a>)",
  28. "created_time":
  29. <= li class=3D"gmail-L9" style=3D"line-height:20px;background-image:initial;ba= ckground-position:initial;background-size:initial;background-repeat:initial= ;background-origin:initial;background-clip:initial;background-color:rgb(238= ,238,238);list-style-type:none">"2016-07-12T16:27:00+0000",
  30. "id":
  31. "311930585656230_569713563= 211263"
  32. = },
  33. {
  34. "message"= : "Hortonworks
  35. | Learn how #ApacheMetron detect #bigdata #cybersecurity threat in
  36. real-time? Spri= ngPeople is an Authorized
  37. Training Partner of Horto= nworks and provides hortonworks certified courses: <a href=3D"http://bit.ly/29Ibe7G/n/n#hadoop">http://bit.ly/29Ibe7G\n\n#hadoop</a>
  38. =
  39. #DataScience",
  40. "created_time":
  41. "2016-07-11T07:26:43+0000",
  42. "id":
  43. "188518004538277_11367339330500= 08"
  44. },
  45. {
  46. "message": &quo= t;Learn how
  47. t= o protect your #data lifecycle w/ Hortonworks Data Flow & WANdisco Fusi= on <a href=3D"http= ://bit.ly/1WO07On">http://bit.ly/1WO07On</a>",
  48. "cre= ated_time":
  49. "2= 016-06-30T19:30:00+0000",
  50. "id":
  51. "114198121933673_1176359322384209"=
  52. },
  53. {
  54. <= li class=3D"gmail-L3" style=3D"line-height:20px;list-style-type:none;backgr= ound-image:initial;background-position:initial;background-size:initial;back= ground-repeat:initial;background-origin:initial;background-clip:initial;bac= kground-color:rgb(238,238,238)"> &= quot;message": "Hortonworks
  55. announces new = MSP and ISV programmes #HadoopSummit <a href=3D"http://bit.ly/29sXPiI<= /a>">http://bit.ly/29sXPiI<= ;/a>",
  56. "created_time":
  57. "2016-06-30T13:02:28+0000",
  58. "id&= quot;:
  59. "17983097779= 4_10153982072807795"
  60. },
  61. {
  62. "message": "Breakfast
  63. meeting at the Hadoop Summit in San Jose with Vishal Dhanuk= a of Hortonworks.
  64. It's going to be a great da= y discussing with conference attendees how we can
  65. work together to harness the power of = big data in healthcare. #HS16SJ",
  66. "created_time":
  67. "2016-06-29T15:09:43+0000&= quot;,
  68. "id":
  69. &= quot;1442034199422403_1596597077299447"
  70. },
  71. {
  72. "message": "#Data lakes
  73. need control & safety against fail= ure. That's where we come in <a href=3D"http://bit.ly/1WO07On&qu= ot;>http://bit.ly/1WO07On</a>= ; Hortonworks",
  74. "created_time":
  75. "2016-06-29T15:00:01+0000",
  76. &q= uot;id":
  77. "1141= 98121933673_1176301025723372"
  78. },
<= p style=3D"margin:0px 0px 10px">DataFlow is available for download from=C2= =A0Github.

--94eb2c07259c26a80d053f3b6bc2--