Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Multiple tweets with same id in twitter streaming.

Multiple tweets with same id in twitter streaming.

New Contributor


I collect tweets with help of this pipeline. I tried to use some own scripts to analyse collected scripts.

I found that i get multiple tweets with same id.

I looked in hdfs://user/flume/tweets and saw that this multiple tweets are in stored files.

So it isn't hive or oozie problem.
May it be flume problem: I made some configuration edits in flume parameters.


TwitterAgent.sinks.HDFS.hdfs.batchSize = 10000 //in github 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 100000 //in github 10000

TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 100000 //in github 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 10000 //in github 100

Or twitter gives this tweets? And it isn't hadoop problem?


Don't have an account?
Coming from Hortonworks? Activate your account here