Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Flume-file generated for twitter have non-printable characters

Flume-file generated for twitter have non-printable characters

Expert Contributor

Hi All,

I have 3 node Cloudera 5.9 Cluster.

I am trying to use Flume to ingest data from Twitter using a keyword. However I am facing 2 issues:

1. File generated has no information related to the keywords used.

 

[hdfs@XXXX ~]$ hadoop fs -cat /user/flume/twitter_data/FlumeData.1496272139910|grep "rosario"
[hdfs@XXXX ~]$ 

2. The file have non-printable or gibberish characters

 

My Flume.conf is as follow:

 

# Naming the components on the current agent. 
TwitterAgent.sources = Twitter 
TwitterAgent.channels = MemChannel 
TwitterAgent.sinks = HDFS
  
# Describing/Configuring the source 
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey = xxxx
TwitterAgent.sources.Twitter.consumerSecret = xxxx
TwitterAgent.sources.Twitter.accessToken = xxxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxxx
TwitterAgent.sources.Twitter.keywords = rosario brindis
  
# Describing/Configuring the sink 

TwitterAgent.sinks.HDFS.type = hdfs 
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://X.X.X.X:8020/user/flume/twitter_data
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream 
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text 
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0 
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.sinks.HDFS.hdfs.callTimeout = 180000
 
# Describing/Configuring the channel 
TwitterAgent.channels.MemChannel.type = memory 
TwitterAgent.channels.MemChannel.capacity = 100000
TwitterAgent.channels.MemChannel.transactionCapacity = 1000
  
# Binding the source and sink to the channel 
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel 

 

Please help as I am not sure what is going wrong.

 

Thanks,

Shilpa

3 REPLIES 3
Highlighted

Re: Flume-file generated for twitter have non-printable characters

Expert Contributor

No one helped me with this issue. Finally I moved from Hadoop to JavaScript API for twitter. Which is working fine.

Re: Flume-file generated for twitter have non-printable characters

Super Collaborator
The TwitterSource is an experimental source, and has issues with generating the proper avro format for writing to hdfs (it creates a full avro schema for each record, which causes issues). It should not be considered viable for production use, so if you were able to switch to a workaround, that would be recommended.

-pd

Re: Flume-file generated for twitter have non-printable characters

Expert Contributor

Ok. thanks for your reply @pdvorak

Don't have an account?
Coming from Hortonworks? Activate your account here