Created 03-15-2021 03:37 AM
I'm trying to stream data from twitter to hdfs with Flume, i'm using Cloudera Quickstart VM 5.13, i don't have any error but the destination directory is empty.
This is my flume.conf file:
TwitterAgent.channels = MemChannel TwitterAgent.sinks = HDFS TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource TwitterAgent.sources.Twitter.channels = MemChannel TwitterAgent.sources.Twitter.consumerKey = Sp0ti7peTvFPDJSWMGk2ChMZM TwitterAgent.sources.Twitter.consumerSecret = Cncmq5b6rKxWPb6qNSPkqpzIR7L3EcQ8WUCeG0gX4L9sPIzflN TwitterAgent.sources.Twitter.accessToken = 1370386818609377287-IsLuhCt54wK4T2Ua9Cb0TC14rrs1c5 TwitterAgent.sources.Twitter.accessTokenSecret = AL7oYsVUQXz5KXtQSj0tu36R85MyvAsBjcgktdZD63Ou6 TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scientist, business intelligence, mapreduce, data warehouse, data warehousing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing TwitterAgent.sinks.HDFS.channel = MemChannel TwitterAgent.sinks.HDFS.type = hdfs TwitterAgent.sinks.HDFS.hdfs.path = hdfs://quickstart.cloudera:8020/user/flume/tweets/ TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream TwitterAgent.sinks.HDFS.hdfs.writeFormat = text TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000 TwitterAgent.sinks.HDFS.hdfs.rollSize = 0 TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000 TwitterAgent.sinks.HDFS.hdfs.rollInterval = 600 TwitterAgent.channels.MemChannel.type = memory TwitterAgent.channels.MemChannel.capacity = 10000 TwitterAgent.channels.MemChannel.transitionCapacity = 100
I'm invoking this command to stream:
flume-ng agent --conf ./conf/ -f /home/cloudera/flume.conf -n TwitterAgent
Please i want to know on which part i'm doing it wrong. Any valuable suggestion is much appreciated.
Thanks in advance.
Created 03-15-2021 12:59 PM
Can you copy and paste the new flume. conf for clarity I have split the different parts
Flow Diagram
Configuring the flume.conf
# Naming the components on the current agent.
TwitterAgent.sources = Twitter # Added
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
# Configuring the source
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey = <consumerKey>
TwitterAgent.sources.Twitter.consumerSecret = <consumerSecret>
TwitterAgent.sources.Twitter.accessToken = <accessToken>
TwitterAgent.sources.Twitter.accessTokenSecret = <accessTokenSecret>
TwitterAgent.sources.Twitter.keywords = <keyword>
# Configuring the sink
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://quickstart.cloudera:8020/user/flume/tweets/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 600
# Configuring the channel
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transitionCapacity = 100
# Binding the source and sink to the channel
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel
$ bin/flume-ng agent --conf ./conf/ -f /home/cloudera/flume.conf -n TwitterAgent
Please let me know if it runs successfully
Created on 03-16-2021 01:47 AM - edited 03-16-2021 01:49 AM
Thank you for your reply, but still nothing. I have this warning in the console:
WARN node.AbstractConfigurationProvider: No configuration found for this host:TwitterAgent
Created on 03-16-2021 04:48 AM - edited 03-16-2021 07:30 AM
That looks a hostname issue this looks like the offending line
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://quickstart.cloudera:8020/user/flume/tweets/
Can you replace the quickstart.cloudera:8020/user/flume/tweets/ with <Sandbox-IP>:8020/user/flume/tweets/
Please let me know
Created 03-16-2021 10:04 AM
@Shelton Even with this it does not work. And i double checked it, my hostname it's quickstart.cloudera.
Created 03-16-2021 01:26 PM
@emeric
what is the output from the Quickstart sandbox CLI of the below command?
$ ifconfig
I am thinking we are on the right path. I will download a sandbox tomorrow if you don't success and try to reproduce your situation.
Happy hadooping
Created 03-16-2021 07:08 PM
@Shelton This is output of ifconfig:
eth0 Link encap:Ethernet HWaddr 08:00:27:B2:38:58
inet addr:10.0.2.15 Bcast:10.0.2.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:18087 errors:0 dropped:0 overruns:0 frame:0
TX packets:13470 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:15917703 (15.1 MiB) TX bytes:1478883 (1.4 MiB)
eth1 Link encap:Ethernet HWaddr 08:00:27:43:DC:82
inet addr:192.168.56.101 Bcast:192.168.56.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2307 errors:0 dropped:0 overruns:0 frame:0
TX packets:443 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:253418 (247.4 KiB) TX bytes:167435 (163.5 KiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:1617427 errors:0 dropped:0 overruns:0 frame:0
TX packets:1617427 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1110478487 (1.0 GiB) TX bytes:1110478487 (1.0 GiB).
And by the way i'm working on Cloudera Quickstart VM 5.13.0
Created 03-17-2021 12:01 AM
Could you try substituting the current values with the below flume.conf
hdfs://10.0.2.15:8020/user/flume/tweets/
hdfs://127.0.0.1:8020/user/flume/tweets/
hdfs://192.168.56.101:8020/user/flume/tweets/
Let me know
Created 03-17-2021 06:42 AM
@Shelton I had already tried all of this, without any results. Can you try it on your end to see if it works for you?
Created 03-17-2021 02:22 PM
@emeric
Twitter has made it di^fficult to register any app, so I am waiting for approval. Sincerely I couldn't stand the 200-word essay to explain what I intend to do whether I am part of a govt etc bla bla bla.
I just copied some text from the website and paste from some website I hope I pass the review 🙂
By Friday I should be good to go