Archives of Support Questions (Read Only)

Tdas · ‎07-12-2016

Hi Team,

I am using cloudera VM with CDH5.5.0.

I am trying to pull weblog data using flume from /var/log/wtmp at ip address 10.3.9.34 at port 22.Let me inform i did ssh root@10.3.9.34 from command prompt of CDH5.5 and i was able to connect to this weblog ipaddress

I am trying to pull weblog from this ipaddress and put that weblog into hdfs path /user/cloudera/flume/ so i ran below flume-ng command :-

flume-ng agent --name agent1 --conf /home/cloudera/flume/conf --conf -file /home/cloudera/flume/conf/flume.conf

Problem is i am getting Fatal error as "java.lang.NullPointerException" while Import

Below is my flume.conf details :-

agent1.sources = netcat-collect
agent1.sinks = hdfs-write
agent1.channels = memory

# Describe/configure source1
agent1.sources.netcat-collect.type = netcat
agent1.sources.netcat-collect.bind = 10.3.9.34
agent1.sources.netcat-collect.port = 22
agent1.sources.netcat-collect.command = tail -F /var/log/wtmp

# Describe solrSink
agent1.sinks.hdfs-write.type = hdfs
agent1.sinks.hdfs-write.hdfs.path = /user/cloudera/flume/%y-%m-%d
agent1.sinks.hdfs-write.hdfs.filePrefix = flume-%y-%m-%d
agent1.sinks.hdfs-write.hdfs.rollSize = 1048576
agent1.sinks.hdfs-write.hdfs.rollCount = 100
agent1.sinks.hdfs-write.hdfs.rollInterval = 120
agent1.sinks.hdfs-write.hdfs.writeFormat = Text
agent1.sinks.hdfs-write.hdfs.fileType = DataStream
agent1.sinks.hdfs-write.hdfs.useLocalTimeStamp = true
agent1.sinks.hdfs-write.hdfs.idleTimeout = 10

# Use a channel which buffers events to a file
# -- The component type name, needs to be FILE.
agent1.channels.memoryChannel.type = memory
agent1.channels.memoryChannel.capacity =10000

# Amount of time (in millis) between checkpoints
agent1.channels.memoryChannel.checkpointInterval 3000

# Max size (in bytes) of a single log file
agent1.channels.memoryChannel.maxFileSize = 2146435071

# Bind the source and sink to the channel
agent1.sources.netcat-collect.channels = memoryChannel
agent1.sinks.hdfs-write.channel = memoryChannel

Execution log attached with this thread

https://drive.google.com/file/d/0B7FLyvHGgEJaYnM2d3JfRXMwNEU/view?usp=sharing
Can someone help me in guiding what is the resolution

Tdas · ‎07-14-2016

Thank you this got solved by below configuration and command

agent1.sources = netcat-collect
agent1.sinks = hdfs-write
agent1.channels = memoryChannel

# Describe/configure source1
agent1.sources.netcat-collect.type = exec
agent1.sources.netcat-collect.bind = 10.3.9.34
agent1.sources.netcat-collect.port = 22
agent1.sources.netcat-collect.command = tail -F /var/log/wtmp

# Describe solrSink
agent1.sinks.hdfs-write.type = hdfs
agent1.sinks.hdfs-write.hdfs.path = /user/cloudera/flume/%y-%m-%d
agent1.sinks.hdfs-write.hdfs.filePrefix = flume-%y-%m-%d
agent1.sinks.hdfs-write.hdfs.rollSize = 1048576
agent1.sinks.hdfs-write.hdfs.rollCount = 100
agent1.sinks.hdfs-write.hdfs.rollInterval = 120
agent1.sinks.hdfs-write.hdfs.writeFormat = Text
agent1.sinks.hdfs-write.hdfs.fileType = DataStream
agent1.sinks.hdfs-write.hdfs.useLocalTimeStamp = true
agent1.sinks.hdfs-write.hdfs.idleTimeout = 10

# Use a channel which buffers events to a file
# -- The component type name, needs to be FILE.
agent1.channels.memoryChannel.type = memory
agent1.channels.memoryChannel.capacity =10000

# Amount of time (in millis) between checkpoints
agent1.channels.memoryChannel.checkpointInterval 300000

# Max size (in bytes) of a single log file
agent1.channels.memoryChannel.maxFileSize = 2146435071

# Bind the source and sink to the channel
agent1.sources.netcat-collect.channels = memoryChannel

Below is the command to pull data from weblog to HDFS

flume-ng agent --name agent1 --conf /home/cloudera/flume/conf --conf-file /home/cloudera/flume/conf/flume.conf

View solution in original post

Tdas · ‎07-14-2016