Support Questions
Find answers, ask questions, and share your expertise

Streaming twitter data from flume to spark for analysis issues

Streaming twitter data from flume to spark for analysis issues

Rising Star



I am using official flume+spark configuration as mentioned in documentation, but after registering to host and port number flume is never able to send events successfully. on the other side spark TID never recieves anything more like its missed.


Below is my configuration: 


TwitterAgent1.sources = PublicStream2
TwitterAgent1.channels = fileCh2
TwitterAgent1.sinks = avrosink2

TwitterAgent1.sources.PublicStream2.type = com.cloudsigma.flume.twitter.TwitterSource
TwitterAgent1.sources.PublicStream2.channels = fileCh2
TwitterAgent1.sources.PublicStream2.consumerKey =
TwitterAgent1.sources.PublicStream2.consumerSecret =
TwitterAgent.sources.PublicStream2.accessToken =
TwitterAgent1.sources.PublicStream2.accessTokenSecret =
TwitterAgent1.sources.PublicStream2.keywords = some keywrds

#TwitterAgent1.sources.PublicStream2.locations = -,-
TwitterAgent1.sources.PublicStream2.language = en
TwitterAgent1.sources.PublicStream2.follow =,

TwitterAgent1.sinks.avrosink2.type = avro
TwitterAgent1.sinks.avrosink2.batch-size = 1
TwitterAgent1.sinks.avrosink2.hostname = 1x5.3x.3.1x2    -->  IP of host as i am in cluster
TwitterAgent1.sinks.avrosink2.port = 9988 = fileCh2

TwitterAgent1.channels.fileCh2.type = file
TwitterAgent1.channels.fileCh2.capacity = 10000
TwitterAgent1.channels.fileCh2.transactionCapacity = 10000


Code for pyspark:


# create SparkContext on all CPUs available: in my case I have 4 CPUs on my laptop
conf = SparkConf().setAppName("tweeterAnalysis")
sc = ps.SparkContext(conf=conf)
sqlContext = SQLContext(sc)
print("Just created a SparkContext")

except ValueError:
warnings.warn("SparkContext already exists in this scope")


from pyspark.streaming import StreamingContext
ssc = StreamingContext(sc, 10)
flumeStream = FlumeUtils.createStream(ssc, "", 41414)


lines = x: x[1])






Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Failed to send events
	at org.apache.flume.sink.AbstractRpcSink.process(
	at org.apache.flume.sink.DefaultSinkProcessor.process(
	at org.apache.flume.SinkRunner$
Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host:, port: 41414 }: Failed to send batch
	at org.apache.flume.api.NettyAvroRpcClient.appendBatch(
	at org.apache.flume.sink.AbstractRpcSink.process(
	... 3 more



WARN scheduler.TaskSetManager: Lost task 0.0 in stage 17093.0 (TID 32941,, executor 24): Failed to bind to:
at org.jboss.netty.bootstrap.ServerBootstrap.bind(
at org.apache.avro.ipc.NettyServer.<init>(



any pne, please help