Support Questions
Find answers, ask questions, and share your expertise

Not able to stream twitter data in to hdfs with flume

Highlighted

Not able to stream twitter data in to hdfs with flume

Contributor

I'm trying to stream data from twitter to hdfs with Flume, i'm using Cloudera Quickstart VM 5.13, i don't have any error but the destination directory is empty.

This is my flume.conf file:

TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = Sp0ti7peTvFPDJSWMGk2ChMZM
TwitterAgent.sources.Twitter.consumerSecret = Cncmq5b6rKxWPb6qNSPkqpzIR7L3EcQ8WUCeG0gX4L9sPIzflN
TwitterAgent.sources.Twitter.accessToken = 1370386818609377287-IsLuhCt54wK4T2Ua9Cb0TC14rrs1c5
TwitterAgent.sources.Twitter.accessTokenSecret = AL7oYsVUQXz5KXtQSj0tu36R85MyvAsBjcgktdZD63Ou6
TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scientist, business intelligence, mapreduce, data warehouse, data warehousing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing

TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://quickstart.cloudera:8020/user/flume/tweets/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = text 
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 600

TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transitionCapacity = 100

I'm invoking this command to stream:

flume-ng agent --conf ./conf/ -f /home/cloudera/flume.conf -n TwitterAgent

Please i want to know on which part i'm doing it wrong. Any valuable suggestion is much appreciated.

Thanks in advance.

16 REPLIES 16

Re: Not able to stream twitter data in to hdfs with flume

Mentor

@emeric 

Can you copy and paste the new flume. conf for clarity I have split the different parts 
Flow Diagram

twitter.JPG

 Configuring the flume.conf

 

# Naming the components on the current agent.
TwitterAgent.sources = Twitter  				# Added
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

# Configuring the source 
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey = <consumerKey>
TwitterAgent.sources.Twitter.consumerSecret = <consumerSecret>
TwitterAgent.sources.Twitter.accessToken = <accessToken> 
TwitterAgent.sources.Twitter.accessTokenSecret = <accessTokenSecret>
TwitterAgent.sources.Twitter.keywords = <keyword>

# Configuring the sink 
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://quickstart.cloudera:8020/user/flume/tweets/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = text 
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 600

# Configuring the channel 
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transitionCapacity = 100

# Binding the source and sink to the channel
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel 

$ bin/flume-ng agent --conf ./conf/ -f /home/cloudera/flume.conf -n TwitterAgent

 

Please let me know if it runs  successfully

 

 

 

 

Highlighted

Re: Not able to stream twitter data in to hdfs with flume

Contributor

@Shelton 

Thank you for your reply, but still nothing. I have this warning in the console:

WARN node.AbstractConfigurationProvider: No configuration found for this host:TwitterAgent​

Highlighted

Re: Not able to stream twitter data in to hdfs with flume

Mentor

@emeric 

That looks a hostname issue this looks like the offending line

 

TwitterAgent.sinks.HDFS.hdfs.path = hdfs://quickstart.cloudera:8020/user/flume/tweets/

 

Can you replace the quickstart.cloudera:8020/user/flume/tweets/ with <Sandbox-IP>:8020/user/flume/tweets/

Please let me know 

Highlighted

Re: Not able to stream twitter data in to hdfs with flume

Contributor

@Shelton Even with this it does not work. And i double checked it, my hostname it's quickstart.cloudera.

Highlighted

Re: Not able to stream twitter data in to hdfs with flume

Mentor

@emeric 
what is the output from the Quickstart sandbox CLI of the below command?

$ ifconfig

I am thinking we are on the right path. I will download a sandbox tomorrow if you don't success and try to reproduce your situation.

Happy hadooping

 

 

 

Highlighted

Re: Not able to stream twitter data in to hdfs with flume

Contributor

@Shelton This is output of ifconfig:

 

eth0 Link encap:Ethernet HWaddr 08:00:27:B2:38:58
inet addr:10.0.2.15 Bcast:10.0.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:18087 errors:0 dropped:0 overruns:0 frame:0
TX packets:13470 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:15917703 (15.1 MiB) TX bytes:1478883 (1.4 MiB)

eth1 Link encap:Ethernet HWaddr 08:00:27:43:DC:82
inet addr:192.168.56.101 Bcast:192.168.56.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2307 errors:0 dropped:0 overruns:0 frame:0
TX packets:443 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:253418 (247.4 KiB) TX bytes:167435 (163.5 KiB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:1617427 errors:0 dropped:0 overruns:0 frame:0
TX packets:1617427 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1110478487 (1.0 GiB) TX bytes:1110478487 (1.0 GiB).

 

And by the way i'm working on Cloudera Quickstart VM 5.13.0

Highlighted

Re: Not able to stream twitter data in to hdfs with flume

Mentor

@emeric 

Could you try substituting the current values with the below flume.conf

hdfs://10.0.2.15:8020/user/flume/tweets/
hdfs://127.0.0.1:8020/user/flume/tweets/
hdfs://192.168.56.101:8020/user/flume/tweets/

Let me know 

 

 

Highlighted

Re: Not able to stream twitter data in to hdfs with flume

Contributor

@Shelton I had already tried all of this, without any results. Can you try it on your end to see if it works for you?

Highlighted

Re: Not able to stream twitter data in to hdfs with flume

Mentor

@emeric 
Twitter has made it di^fficult to register any app, so I am waiting for approval. Sincerely I couldn't stand the 200-word essay to explain what I intend to do whether I am part of a govt etc bla  bla bla.

I just copied some text from the website and paste from some website I hope I pass the review

By Friday I should be good to go