Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Streaming twitter data from flume to spark for analysis issues


Streaming twitter data from flume to spark for analysis issues

Rising Star



I am using official flume+spark configuration as mentioned in documentation, but after registering to host and port number flume is never able to send events successfully. on the other side spark TID never recieves anything more like its missed.


Below is my configuration: 


TwitterAgent1.sources = PublicStream2
TwitterAgent1.channels = fileCh2
TwitterAgent1.sinks = avrosink2

TwitterAgent1.sources.PublicStream2.type = com.cloudsigma.flume.twitter.TwitterSource
TwitterAgent1.sources.PublicStream2.channels = fileCh2
TwitterAgent1.sources.PublicStream2.consumerKey =
TwitterAgent1.sources.PublicStream2.consumerSecret =
TwitterAgent.sources.PublicStream2.accessToken =
TwitterAgent1.sources.PublicStream2.accessTokenSecret =
TwitterAgent1.sources.PublicStream2.keywords = some keywrds

#TwitterAgent1.sources.PublicStream2.locations = -,-
TwitterAgent1.sources.PublicStream2.language = en
TwitterAgent1.sources.PublicStream2.follow =,

TwitterAgent1.sinks.avrosink2.type = avro
TwitterAgent1.sinks.avrosink2.batch-size = 1
TwitterAgent1.sinks.avrosink2.hostname = 1x5.3x.3.1x2    -->  IP of host as i am in cluster
TwitterAgent1.sinks.avrosink2.port = 9988 = fileCh2

TwitterAgent1.channels.fileCh2.type = file
TwitterAgent1.channels.fileCh2.capacity = 10000
TwitterAgent1.channels.fileCh2.transactionCapacity = 10000


Code for pyspark:


# create SparkContext on all CPUs available: in my case I have 4 CPUs on my laptop
conf = SparkConf().setAppName("tweeterAnalysis")
sc = ps.SparkContext(conf=conf)
sqlContext = SQLContext(sc)
print("Just created a SparkContext")

except ValueError:
warnings.warn("SparkContext already exists in this scope")


from pyspark.streaming import StreamingContext
ssc = StreamingContext(sc, 10)
flumeStream = FlumeUtils.createStream(ssc, "", 41414)


lines = x: x[1])






Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Failed to send events
	at org.apache.flume.sink.AbstractRpcSink.process(
	at org.apache.flume.sink.DefaultSinkProcessor.process(
	at org.apache.flume.SinkRunner$
Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient { host:, port: 41414 }: Failed to send batch
	at org.apache.flume.api.NettyAvroRpcClient.appendBatch(
	at org.apache.flume.sink.AbstractRpcSink.process(
	... 3 more



WARN scheduler.TaskSetManager: Lost task 0.0 in stage 17093.0 (TID 32941,, executor 24): Failed to bind to:
at org.jboss.netty.bootstrap.ServerBootstrap.bind(
at org.apache.avro.ipc.NettyServer.<init>(



any pne, please help 

Don't have an account?
Coming from Hortonworks? Activate your account here