Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

flume file generated for twitter agent have non-printable or gibberish characters also there is no information related to my keyword

flume file generated for twitter agent have non-printable or gibberish characters also there is no information related to my keyword

Contributor

Hi All,

I have 3 node Cloudera 5.9 Cluster.

I am trying to use Flume to ingest data from Twitter using a keyword. However I am facing 2 issues:

1. File generated has no information related to the keywords used.

[hdfs@XXXX ~]$ hadoop fs -cat /user/flume/twitter_data/FlumeData.1496272139910|grep "rosario"
[hdfs@XXXX ~]$


2. The file have non-printable or gibberish characters

15887-twitter-file.png

My Flume.conf is as follow:

# Naming the components on the current agent. 

TwitterAgent.sources = Twitter 
TwitterAgent.channels = MemChannel 
TwitterAgent.sinks = HDFS
# Describing/Configuring the source 
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey = XXXX
TwitterAgent.sources.Twitter.consumerSecret = XXXX
TwitterAgent.sources.Twitter.accessToken = XXXX
TwitterAgent.sources.Twitter.accessTokenSecret = XXXX
TwitterAgent.sources.Twitter.keywords = rosario brindis
# Describing/Configuring the sink 
TwitterAgent.sinks.HDFS.type = hdfs 
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://X.X.X.X:8020/user/hdfs/twitter_data/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream 
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text 
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.sinks.HDFS.hdfs.callTimeout = 180000
# Describing/Configuring the channel 
TwitterAgent.channels.MemChannel.type = memory 
TwitterAgent.channels.MemChannel.capacity = 100000
TwitterAgent.channels.MemChannel.transactionCapacity = 1000
# Binding the source and sink to the channel 
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel 

Please help as I am not sure what is going wrong.

Thanks,

Shilpa

Don't have an account?
Coming from Hortonworks? Activate your account here