Reply
Contributor
Posts: 54
Registered: ‎11-17-2016

Flume-file generated for twitter have non-printable characters

Hi All,

I have 3 node Cloudera 5.9 Cluster.

I am trying to use Flume to ingest data from Twitter using a keyword. However I am facing 2 issues:

1. File generated has no information related to the keywords used.

 

[hdfs@XXXX ~]$ hadoop fs -cat /user/flume/twitter_data/FlumeData.1496272139910|grep "rosario"
[hdfs@XXXX ~]$ 

2. The file have non-printable or gibberish characters

 

My Flume.conf is as follow:

 

# Naming the components on the current agent. 
TwitterAgent.sources = Twitter 
TwitterAgent.channels = MemChannel 
TwitterAgent.sinks = HDFS
  
# Describing/Configuring the source 
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey = xxxx
TwitterAgent.sources.Twitter.consumerSecret = xxxx
TwitterAgent.sources.Twitter.accessToken = xxxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxxx
TwitterAgent.sources.Twitter.keywords = rosario brindis
  
# Describing/Configuring the sink 

TwitterAgent.sinks.HDFS.type = hdfs 
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://X.X.X.X:8020/user/flume/twitter_data
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream 
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text 
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0 
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.sinks.HDFS.hdfs.callTimeout = 180000
 
# Describing/Configuring the channel 
TwitterAgent.channels.MemChannel.type = memory 
TwitterAgent.channels.MemChannel.capacity = 100000
TwitterAgent.channels.MemChannel.transactionCapacity = 1000
  
# Binding the source and sink to the channel 
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel 

 

Please help as I am not sure what is going wrong.

 

Thanks,

Shilpa

Contributor
Posts: 54
Registered: ‎11-17-2016

Re: Flume-file generated for twitter have non-printable characters

No one helped me with this issue. Finally I moved from Hadoop to JavaScript API for twitter. Which is working fine.

Cloudera Employee
Posts: 184
Registered: ‎01-09-2014

Re: Flume-file generated for twitter have non-printable characters

The TwitterSource is an experimental source, and has issues with generating the proper avro format for writing to hdfs (it creates a full avro schema for each record, which causes issues). It should not be considered viable for production use, so if you were able to switch to a workaround, that would be recommended.

-pd
Highlighted
Contributor
Posts: 54
Registered: ‎11-17-2016

Re: Flume-file generated for twitter have non-printable characters

Ok. thanks for your reply @pdvorak

Announcements