Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Trouble when extract data from facebook

Trouble when extract data from facebook

New Contributor

Hello folks,

I'm new to the Hadoop World and I'm having some trouble with it.

My goal is to extract data from a facebook page (i'm using restfb API) every hour.

I have 5 sources:

  • FacebookPageFansCity
  • FacebookPageFansGenderAge
  • FacebookPageFans
  • FacebookPagePosts
  • FacebookPageViews

The first four works perfectly, they extract the data every hour with no problems, but for some reason the "FacebookPageViews" doesn't.

 

I don't know why, but "FacebookPageViews" sometimes extract the data every hour and sometimes just a few hours of the day.

 

This is a image of the data extract using "FacebookPagePost"Screenshot_1.png

 

As you can see, it extract the data from 19/03/2015 from every hour.

 

This is the "FacebookPageViews"

Screenshot_2.png

It extract only the first four hours of 19/03/2015 and instead of starting the day 20/03 at 12:00 AM, started at 01:00 PM.

 

And then, when my coordinator try to execute my workflow, this happens:

Screenshot_3.png

*this message appears in the missing hours of 19/03 and every day that I have this problem.

 

This is my flume config.

 

FacebookAgent.sources = FacebookPageFansCity FacebookPageFansGenderAge FacebookPageFans FacebookPagePosts FacebookPageViews
FacebookAgent.channels = MemoryChannelFacebookPageFansCity MemoryChannelFacebookPageFansGenderAge MemoryChannelFacebookPageFans MemoryChannelFacebookPagePosts MemoryChannelFacebookPageViews
FacebookAgent.sinks = HDFSFacebookPageFansCity HDFSFacebookPageFansGenderAge HDFSFacebookPageFans HDFSFacebookPagePosts HDFSFacebookPageViews

# FacebookPageFansCity

FacebookAgent.sources.FacebookPageFansCity.type = br.com.tsystems.hadoop.flume.source.restfb.FacebookPageFansCitySource
FacebookAgent.sources.FacebookPageFansCity.channels = MemoryChannelFacebookPageFansCity
FacebookAgent.sources.FacebookPageFansCity.appId = null
FacebookAgent.sources.FacebookPageFansCity.appSecret = null
FacebookAgent.sources.FacebookPageFansCity.accessToken = *confidential*
FacebookAgent.sources.FacebookPageFansCity.pageId = *confidential*
FacebookAgent.sources.FacebookPageFansCity.proxyEnabled = false
FacebookAgent.sources.FacebookPageFansCity.proxyHost = null
FacebookAgent.sources.FacebookPageFansCity.proxyPort = -1
FacebookAgent.sources.FacebookPageFansCity.refreshInterval = 3600

FacebookAgent.sinks.HDFSFacebookPageFansCity.channel = MemoryChannelFacebookPageFansCity
FacebookAgent.sinks.HDFSFacebookPageFansCity.type = hdfs
FacebookAgent.sinks.HDFSFacebookPageFansCity.hdfs.path = hdfs://hdoop01:8020/user/flume/pocfacebook/pagefanscity/%Y%m%d%H
FacebookAgent.sinks.HDFSFacebookPageFansCity.hdfs.fileType = DataStream
FacebookAgent.sinks.HDFSFacebookPageFansCity.hdfs.writeFormat = Text
FacebookAgent.sinks.HDFSFacebookPageFansCity.hdfs.batchSize = 1000
FacebookAgent.sinks.HDFSFacebookPageFansCity.hdfs.rollSize = 0
FacebookAgent.sinks.HDFSFacebookPageFansCity.hdfs.rollCount = 10000

FacebookAgent.channels.MemoryChannelFacebookPageFansCity.type = memory
FacebookAgent.channels.MemoryChannelFacebookPageFansCity.capacity = 10000
FacebookAgent.channels.MemoryChannelFacebookPageFansCity.transactionCapacity = 1000

# FacebookPageFansGenderAge

FacebookAgent.sources.FacebookPageFansGenderAge.type = br.com.tsystems.hadoop.flume.source.restfb.FacebookPageFansGenderAgeSource
FacebookAgent.sources.FacebookPageFansGenderAge.channels = MemoryChannelFacebookPageFansGenderAge
FacebookAgent.sources.FacebookPageFansGenderAge.appId = null
FacebookAgent.sources.FacebookPageFansGenderAge.appSecret = null
FacebookAgent.sources.FacebookPageFansGenderAge.accessToken = *confidential*
FacebookAgent.sources.FacebookPageFansGenderAge.pageId = *confidential*
FacebookAgent.sources.FacebookPageFansGenderAge.proxyEnabled = false
FacebookAgent.sources.FacebookPageFansGenderAge.proxyHost = null
FacebookAgent.sources.FacebookPageFansGenderAge.proxyPort = -1
FacebookAgent.sources.FacebookPageFansGenderAge.refreshInterval = 3600

FacebookAgent.sinks.HDFSFacebookPageFansGenderAge.channel = MemoryChannelFacebookPageFansGenderAge
FacebookAgent.sinks.HDFSFacebookPageFansGenderAge.type = hdfs
FacebookAgent.sinks.HDFSFacebookPageFansGenderAge.hdfs.path = hdfs://hdoop01:8020/user/flume/pocfacebook/pagefansgenderage/%Y%m%d%H
FacebookAgent.sinks.HDFSFacebookPageFansGenderAge.hdfs.fileType = DataStream
FacebookAgent.sinks.HDFSFacebookPageFansGenderAge.hdfs.writeFormat = Text
FacebookAgent.sinks.HDFSFacebookPageFansGenderAge.hdfs.batchSize = 1000
FacebookAgent.sinks.HDFSFacebookPageFansGenderAge.hdfs.rollSize = 0
FacebookAgent.sinks.HDFSFacebookPageFansGenderAge.hdfs.rollCount = 10000

FacebookAgent.channels.MemoryChannelFacebookPageFansGenderAge.type = memory
FacebookAgent.channels.MemoryChannelFacebookPageFansGenderAge.capacity = 10000
FacebookAgent.channels.MemoryChannelFacebookPageFansGenderAge.transactionCapacity = 1000

# FacebookPageFans

FacebookAgent.sources.FacebookPageFans.type = br.com.tsystems.hadoop.flume.source.restfb.FacebookPageFansSource
FacebookAgent.sources.FacebookPageFans.channels = MemoryChannelFacebookPageFans
FacebookAgent.sources.FacebookPageFans.appId = null
FacebookAgent.sources.FacebookPageFans.appSecret = null
FacebookAgent.sources.FacebookPageFans.accessToken = *confidential*
FacebookAgent.sources.FacebookPageFans.pageId = *confidential*
FacebookAgent.sources.FacebookPageFans.proxyEnabled = false
FacebookAgent.sources.FacebookPageFans.proxyHost = null
FacebookAgent.sources.FacebookPageFans.proxyPort = -1
FacebookAgent.sources.FacebookPageFans.refreshInterval = 3600

FacebookAgent.sinks.HDFSFacebookPageFans.channel = MemoryChannelFacebookPageFans
FacebookAgent.sinks.HDFSFacebookPageFans.type = hdfs
FacebookAgent.sinks.HDFSFacebookPageFans.hdfs.path = hdfs://hdoop01:8020/user/flume/pocfacebook/pagefans/%Y%m%d%H
FacebookAgent.sinks.HDFSFacebookPageFans.hdfs.fileType = DataStream
FacebookAgent.sinks.HDFSFacebookPageFans.hdfs.writeFormat = Text
FacebookAgent.sinks.HDFSFacebookPageFans.hdfs.batchSize = 1000
FacebookAgent.sinks.HDFSFacebookPageFans.hdfs.rollSize = 0
FacebookAgent.sinks.HDFSFacebookPageFans.hdfs.rollCount = 10000

FacebookAgent.channels.MemoryChannelFacebookPageFans.type = memory
FacebookAgent.channels.MemoryChannelFacebookPageFans.capacity = 10000
FacebookAgent.channels.MemoryChannelFacebookPageFans.transactionCapacity = 1000

# FacebookPagePosts

FacebookAgent.sources.FacebookPagePosts.type = br.com.tsystems.hadoop.flume.source.restfb.FacebookPagePostsSource
FacebookAgent.sources.FacebookPagePosts.channels = MemoryChannelFacebookPagePosts
FacebookAgent.sources.FacebookPagePosts.appId = null
FacebookAgent.sources.FacebookPagePosts.appSecret = null
FacebookAgent.sources.FacebookPagePosts.accessToken = *confidential*
FacebookAgent.sources.FacebookPagePosts.pageId = *confidential*
FacebookAgent.sources.FacebookPagePosts.proxyEnabled = false
FacebookAgent.sources.FacebookPagePosts.proxyHost = null
FacebookAgent.sources.FacebookPagePosts.proxyPort = -1
FacebookAgent.sources.FacebookPagePosts.refreshInterval = 3600

FacebookAgent.sinks.HDFSFacebookPagePosts.channel = MemoryChannelFacebookPagePosts
FacebookAgent.sinks.HDFSFacebookPagePosts.type = hdfs
FacebookAgent.sinks.HDFSFacebookPagePosts.hdfs.path = hdfs://hdoop01:8020/user/flume/pocfacebook/pageposts/%Y%m%d%H
FacebookAgent.sinks.HDFSFacebookPagePosts.hdfs.fileType = DataStream
FacebookAgent.sinks.HDFSFacebookPagePosts.hdfs.writeFormat = Text
FacebookAgent.sinks.HDFSFacebookPagePosts.hdfs.batchSize = 1000
FacebookAgent.sinks.HDFSFacebookPagePosts.hdfs.rollSize = 0
FacebookAgent.sinks.HDFSFacebookPagePosts.hdfs.rollCount = 10000

FacebookAgent.channels.MemoryChannelFacebookPagePosts.type = memory
FacebookAgent.channels.MemoryChannelFacebookPagePosts.capacity = 10000
FacebookAgent.channels.MemoryChannelFacebookPagePosts.transactionCapacity = 5000

# FacebookPageViews

FacebookAgent.sources.FacebookPageViews.type = br.com.tsystems.hadoop.flume.source.restfb.FacebookPageViewsSource
FacebookAgent.sources.FacebookPageViews.channels = MemoryChannelFacebookPageViews
FacebookAgent.sources.FacebookPageViews.appId = null
FacebookAgent.sources.FacebookPageViews.appSecret = null
FacebookAgent.sources.FacebookPageViews.accessToken = *confidential*
FacebookAgent.sources.FacebookPageViews.pageId = *confidential*
FacebookAgent.sources.FacebookPageViews.proxyEnabled = false
FacebookAgent.sources.FacebookPageViews.proxyHost = null
FacebookAgent.sources.FacebookPageViews.proxyPort = -1
FacebookAgent.sources.FacebookPageViews.refreshInterval = 3600

FacebookAgent.sinks.HDFSFacebookPageViews.channel = MemoryChannelFacebookPageViews
FacebookAgent.sinks.HDFSFacebookPageViews.type = hdfs
FacebookAgent.sinks.HDFSFacebookPageViews.hdfs.path = hdfs://hdoop01:8020/user/flume/pocfacebook/pageviews/%Y%m%d%H
FacebookAgent.sinks.HDFSFacebookPageViews.hdfs.fileType = DataStream
FacebookAgent.sinks.HDFSFacebookPageViews.hdfs.writeFormat = Text
FacebookAgent.sinks.HDFSFacebookPageViews.hdfs.batchSize = 1000
FacebookAgent.sinks.HDFSFacebookPageViews.hdfs.rollSize = 0
FacebookAgent.sinks.HDFSFacebookPageViews.hdfs.rollCount = 10000

FacebookAgent.channels.MemoryChannelFacebookPageViews.type = memory
FacebookAgent.channels.MemoryChannelFacebookPageViews.capacity = 10000
FacebookAgent.channels.MemoryChannelFacebookPageViews.transactionCapacity = 1000

 

Anybody have any clues of why this is happening?

 

2 REPLIES 2

Re: Trouble when extract data from facebook

is this solved ? 

Highlighted

Re: Trouble when extract data from facebook

New Contributor

I am new in Hadoop and I am streaming Twitter Data which works fine but I want to stream Facebook / LinkedIn data as well. Is there any Flume Agent present which can solve this task. I really appreciate your response.