Created on 05-31-2017 11:24 PM - edited 08-17-2019 11:58 PM
Hi All,
I have 3 node Cloudera 5.9 Cluster.
I am trying to use Flume to ingest data from Twitter using a keyword. However I am facing 2 issues:
1. File generated has no information related to the keywords used.
[hdfs@XXXX ~]$ hadoop fs -cat /user/flume/twitter_data/FlumeData.1496272139910|grep "rosario" [hdfs@XXXX ~]$
2. The file have non-printable or gibberish characters
My Flume.conf is as follow:
# Naming the components on the current agent. TwitterAgent.sources = Twitter TwitterAgent.channels = MemChannel TwitterAgent.sinks = HDFS # Describing/Configuring the source TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource TwitterAgent.sources.Twitter.consumerKey = XXXX TwitterAgent.sources.Twitter.consumerSecret = XXXX TwitterAgent.sources.Twitter.accessToken = XXXX TwitterAgent.sources.Twitter.accessTokenSecret = XXXX TwitterAgent.sources.Twitter.keywords = rosario brindis # Describing/Configuring the sink TwitterAgent.sinks.HDFS.type = hdfs TwitterAgent.sinks.HDFS.hdfs.path = hdfs://X.X.X.X:8020/user/hdfs/twitter_data/ TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000 TwitterAgent.sinks.HDFS.hdfs.rollSize = 0 TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000 TwitterAgent.sinks.HDFS.hdfs.callTimeout = 180000 # Describing/Configuring the channel TwitterAgent.channels.MemChannel.type = memory TwitterAgent.channels.MemChannel.capacity = 100000 TwitterAgent.channels.MemChannel.transactionCapacity = 1000 # Binding the source and sink to the channel TwitterAgent.sources.Twitter.channels = MemChannel TwitterAgent.sinks.HDFS.channel = MemChannel
Please help as I am not sure what is going wrong.
Thanks,
Shilpa