Support Questions

Find answers, ask questions, and share your expertise

Flume--ng command error while importing weblog from webserver ip address to hadoop hdfs path

avatar
Contributor

Hi Team,

I am using cloudera VM with CDH5.5.0.


I am trying to pull weblog data using flume from /var/log/wtmp at ip address 10.3.9.34 at port 22.Let me inform i did ssh root@10.3.9.34 from command prompt of CDH5.5 and i was able to connect to this weblog ipaddress

 

I am trying to pull weblog from this ipaddress and put that weblog into hdfs path /user/cloudera/flume/ so i ran below flume-ng command :-


flume-ng agent --name agent1 --conf /home/cloudera/flume/conf --conf -file /home/cloudera/flume/conf/flume.conf

Problem is i am getting Fatal error as "java.lang.NullPointerException" while Import

Below is my flume.conf details :-

 

agent1.sources = netcat-collect
agent1.sinks = hdfs-write
agent1.channels = memory

# Describe/configure source1
agent1.sources.netcat-collect.type = netcat
agent1.sources.netcat-collect.bind = 10.3.9.34
agent1.sources.netcat-collect.port = 22
agent1.sources.netcat-collect.command = tail -F /var/log/wtmp

# Describe solrSink
agent1.sinks.hdfs-write.type = hdfs
agent1.sinks.hdfs-write.hdfs.path = /user/cloudera/flume/%y-%m-%d
agent1.sinks.hdfs-write.hdfs.filePrefix = flume-%y-%m-%d
agent1.sinks.hdfs-write.hdfs.rollSize = 1048576
agent1.sinks.hdfs-write.hdfs.rollCount = 100
agent1.sinks.hdfs-write.hdfs.rollInterval = 120
agent1.sinks.hdfs-write.hdfs.writeFormat = Text
agent1.sinks.hdfs-write.hdfs.fileType = DataStream
agent1.sinks.hdfs-write.hdfs.useLocalTimeStamp = true
agent1.sinks.hdfs-write.hdfs.idleTimeout = 10


# Use a channel which buffers events to a file
# -- The component type name, needs to be FILE.
agent1.channels.memoryChannel.type = memory
agent1.channels.memoryChannel.capacity =10000


# Amount of time (in millis) between checkpoints
agent1.channels.memoryChannel.checkpointInterval 3000

# Max size (in bytes) of a single log file
agent1.channels.memoryChannel.maxFileSize = 2146435071

# Bind the source and sink to the channel
agent1.sources.netcat-collect.channels = memoryChannel
agent1.sinks.hdfs-write.channel = memoryChannel

 

Execution log attached with this thread

https://drive.google.com/file/d/0B7FLyvHGgEJaYnM2d3JfRXMwNEU/view?usp=sharing
Can someone help me in guiding what is the resolution

 

1 ACCEPTED SOLUTION

avatar
Contributor

Thank you this got solved by below configuration and command

 

agent1.sources = netcat-collect
agent1.sinks = hdfs-write
agent1.channels = memoryChannel

# Describe/configure source1
agent1.sources.netcat-collect.type = exec
agent1.sources.netcat-collect.bind = 10.3.9.34
agent1.sources.netcat-collect.port = 22
agent1.sources.netcat-collect.command = tail -F /var/log/wtmp

# Describe solrSink
agent1.sinks.hdfs-write.type = hdfs
agent1.sinks.hdfs-write.hdfs.path = /user/cloudera/flume/%y-%m-%d
agent1.sinks.hdfs-write.hdfs.filePrefix = flume-%y-%m-%d
agent1.sinks.hdfs-write.hdfs.rollSize = 1048576
agent1.sinks.hdfs-write.hdfs.rollCount = 100
agent1.sinks.hdfs-write.hdfs.rollInterval = 120
agent1.sinks.hdfs-write.hdfs.writeFormat = Text
agent1.sinks.hdfs-write.hdfs.fileType = DataStream
agent1.sinks.hdfs-write.hdfs.useLocalTimeStamp = true
agent1.sinks.hdfs-write.hdfs.idleTimeout = 10

 

# Use a channel which buffers events to a file
# -- The component type name, needs to be FILE.
agent1.channels.memoryChannel.type = memory
agent1.channels.memoryChannel.capacity =10000


# Amount of time (in millis) between checkpoints
agent1.channels.memoryChannel.checkpointInterval 300000

# Max size (in bytes) of a single log file
agent1.channels.memoryChannel.maxFileSize = 2146435071

# Bind the source and sink to the channel
agent1.sources.netcat-collect.channels = memoryChannel


Below is the command to pull data from weblog to HDFS

flume-ng agent --name agent1 --conf /home/cloudera/flume/conf --conf-file /home/cloudera/flume/conf/flume.conf

View solution in original post

2 REPLIES 2

avatar
Contributor

Thank you this got solved by below configuration and command

 

agent1.sources = netcat-collect
agent1.sinks = hdfs-write
agent1.channels = memoryChannel

# Describe/configure source1
agent1.sources.netcat-collect.type = exec
agent1.sources.netcat-collect.bind = 10.3.9.34
agent1.sources.netcat-collect.port = 22
agent1.sources.netcat-collect.command = tail -F /var/log/wtmp

# Describe solrSink
agent1.sinks.hdfs-write.type = hdfs
agent1.sinks.hdfs-write.hdfs.path = /user/cloudera/flume/%y-%m-%d
agent1.sinks.hdfs-write.hdfs.filePrefix = flume-%y-%m-%d
agent1.sinks.hdfs-write.hdfs.rollSize = 1048576
agent1.sinks.hdfs-write.hdfs.rollCount = 100
agent1.sinks.hdfs-write.hdfs.rollInterval = 120
agent1.sinks.hdfs-write.hdfs.writeFormat = Text
agent1.sinks.hdfs-write.hdfs.fileType = DataStream
agent1.sinks.hdfs-write.hdfs.useLocalTimeStamp = true
agent1.sinks.hdfs-write.hdfs.idleTimeout = 10

 

# Use a channel which buffers events to a file
# -- The component type name, needs to be FILE.
agent1.channels.memoryChannel.type = memory
agent1.channels.memoryChannel.capacity =10000


# Amount of time (in millis) between checkpoints
agent1.channels.memoryChannel.checkpointInterval 300000

# Max size (in bytes) of a single log file
agent1.channels.memoryChannel.maxFileSize = 2146435071

# Bind the source and sink to the channel
agent1.sources.netcat-collect.channels = memoryChannel


Below is the command to pull data from weblog to HDFS

flume-ng agent --name agent1 --conf /home/cloudera/flume/conf --conf-file /home/cloudera/flume/conf/flume.conf

avatar
If you are using the exec source to tail a file, keep in mind that it is not a very reliable source. I would suggest using the taildir source (https://archive.cloudera.com/cdh5/cdh/5/flume-ng/FlumeUserGuide.html#taildir-source) to tail files reliably.

-pd