Welcome to the Cloudera Community

Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

Flume not writing data to HDFS

avatar
Explorer

Hi,

 

I am trying to write data with flume but unfortunately the data is not getting transferred to HDFS. Here is the file I am using. Trying to add the tail statements of hadoop log.

 

I can see the source and sink are synchronized but not sure why the data is not getting written to hdfs. Am I missing anything here.

 

Command: [cloudera@localhost flume-ng]$ flume-ng agent -n agent -f conf/flume.properties -Dflume.root.logger=DEBUG,console

 

Flume.Properties: 

 

agent.channels.memoryChannel.type = memory
agent.channels.memoryChannel.capacity = 100

# Define a source on agent and connect to channel memoryChannel.
agent.sources.tail-source.type = exec
agent.sources.tail-source.command = tail -F /var/log/hadoop-0.20-mapreduce/hadoop-cmf-mapreduce1-JOBTRACKER-localhost.localdomain.log
agent.sources.tail-source.channels = memoryChannel

# Define a sink that outputs to logger.
agent.sinks.log-sink.channel = memoryChannel
agent.sinks.log-sink.type = logger

agent.sinks.hdfs-sink.channel = memoryChannel
agent.sinks.hdfs-sink.type = hdfs
agent.sinks.hdfs-sink.hdfs.path = /tmp/data/events
agent.sinks.hdfs-sink.hdfs.fileType = DataStream

# Activate channel, source and sinks
agent.channels = memoryChannel
agent.sources = tail-source
agent.sinks = log-sink hdfs-sink

 

Log on the console:

 

14/10/07 10:08:15 INFO nodemanager.DefaultLogicalNodeManager: Starting new configuration:{ sourceRunners:{tail-source=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:tail-source,state:IDLE} }} sinkRunners:{hdfs-sink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@100bac2 counterGroup:{ name:null counters:{} } }, log-sink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@e51b2c counterGroup:{ name:null counters:{} } }} channels:{memoryChannel=org.apache.flume.channel.MemoryChannel{name: memoryChannel}} }
14/10/07 10:08:15 INFO nodemanager.DefaultLogicalNodeManager: Starting Channel memoryChannel
14/10/07 10:08:15 INFO nodemanager.DefaultLogicalNodeManager: Waiting for channel: memoryChannel to start. Sleeping for 500 ms
14/10/07 10:08:15 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: memoryChannel started
14/10/07 10:08:16 INFO nodemanager.DefaultLogicalNodeManager: Starting Sink hdfs-sink
14/10/07 10:08:16 INFO nodemanager.DefaultLogicalNodeManager: Starting Sink log-sink
14/10/07 10:08:16 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: hdfs-sink started
14/10/07 10:08:16 INFO nodemanager.DefaultLogicalNodeManager: Starting Source tail-source
14/10/07 10:08:16 INFO source.ExecSource: Exec source starting with command:tail -F /var/log/hadoop/hadoop-hadoop-jobtracker-localhost.localdomain.log

 

Thanks,

Azzu

 

Who agreed with this topic