Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Flume--ng command error while importing weblog from webserver ip address to hadoop hdfs path

avatar
Contributor

Hi Team,

I am using cloudera VM with CDH5.5.0.


I am trying to pull weblog data using flume from /var/log/wtmp at ip address 10.3.9.34 at port 22.Let me inform i did ssh root@10.3.9.34 from command prompt of CDH5.5 and i was able to connect to this weblog ipaddress

 

I am trying to pull weblog from this ipaddress and put that weblog into hdfs path /user/cloudera/flume/ so i ran below flume-ng command :-


flume-ng agent --name agent1 --conf /home/cloudera/flume/conf --conf -file /home/cloudera/flume/conf/flume.conf

Problem is i am getting Fatal error as "java.lang.NullPointerException" while Import

Below is my flume.conf details :-

 

agent1.sources = netcat-collect
agent1.sinks = hdfs-write
agent1.channels = memory

# Describe/configure source1
agent1.sources.netcat-collect.type = netcat
agent1.sources.netcat-collect.bind = 10.3.9.34
agent1.sources.netcat-collect.port = 22
agent1.sources.netcat-collect.command = tail -F /var/log/wtmp

# Describe solrSink
agent1.sinks.hdfs-write.type = hdfs
agent1.sinks.hdfs-write.hdfs.path = /user/cloudera/flume/%y-%m-%d
agent1.sinks.hdfs-write.hdfs.filePrefix = flume-%y-%m-%d
agent1.sinks.hdfs-write.hdfs.rollSize = 1048576
agent1.sinks.hdfs-write.hdfs.rollCount = 100
agent1.sinks.hdfs-write.hdfs.rollInterval = 120
agent1.sinks.hdfs-write.hdfs.writeFormat = Text
agent1.sinks.hdfs-write.hdfs.fileType = DataStream
agent1.sinks.hdfs-write.hdfs.useLocalTimeStamp = true
agent1.sinks.hdfs-write.hdfs.idleTimeout = 10


# Use a channel which buffers events to a file
# -- The component type name, needs to be FILE.
agent1.channels.memoryChannel.type = memory
agent1.channels.memoryChannel.capacity =10000


# Amount of time (in millis) between checkpoints
agent1.channels.memoryChannel.checkpointInterval 3000

# Max size (in bytes) of a single log file
agent1.channels.memoryChannel.maxFileSize = 2146435071

# Bind the source and sink to the channel
agent1.sources.netcat-collect.channels = memoryChannel
agent1.sinks.hdfs-write.channel = memoryChannel

 

Execution log attached with this thread

https://drive.google.com/file/d/0B7FLyvHGgEJaYnM2d3JfRXMwNEU/view?usp=sharing
Can someone help me in guiding what is the resolution

 

1 ACCEPTED SOLUTION

avatar
Contributor

Thank you this got solved by below configuration and command

 

agent1.sources = netcat-collect
agent1.sinks = hdfs-write
agent1.channels = memoryChannel

# Describe/configure source1
agent1.sources.netcat-collect.type = exec
agent1.sources.netcat-collect.bind = 10.3.9.34
agent1.sources.netcat-collect.port = 22
agent1.sources.netcat-collect.command = tail -F /var/log/wtmp

# Describe solrSink
agent1.sinks.hdfs-write.type = hdfs
agent1.sinks.hdfs-write.hdfs.path = /user/cloudera/flume/%y-%m-%d
agent1.sinks.hdfs-write.hdfs.filePrefix = flume-%y-%m-%d
agent1.sinks.hdfs-write.hdfs.rollSize = 1048576
agent1.sinks.hdfs-write.hdfs.rollCount = 100
agent1.sinks.hdfs-write.hdfs.rollInterval = 120
agent1.sinks.hdfs-write.hdfs.writeFormat = Text
agent1.sinks.hdfs-write.hdfs.fileType = DataStream
agent1.sinks.hdfs-write.hdfs.useLocalTimeStamp = true
agent1.sinks.hdfs-write.hdfs.idleTimeout = 10

 

# Use a channel which buffers events to a file
# -- The component type name, needs to be FILE.
agent1.channels.memoryChannel.type = memory
agent1.channels.memoryChannel.capacity =10000


# Amount of time (in millis) between checkpoints
agent1.channels.memoryChannel.checkpointInterval 300000

# Max size (in bytes) of a single log file
agent1.channels.memoryChannel.maxFileSize = 2146435071

# Bind the source and sink to the channel
agent1.sources.netcat-collect.channels = memoryChannel


Below is the command to pull data from weblog to HDFS

flume-ng agent --name agent1 --conf /home/cloudera/flume/conf --conf-file /home/cloudera/flume/conf/flume.conf

View solution in original post

2 REPLIES 2

avatar
Contributor

Thank you this got solved by below configuration and command

 

agent1.sources = netcat-collect
agent1.sinks = hdfs-write
agent1.channels = memoryChannel

# Describe/configure source1
agent1.sources.netcat-collect.type = exec
agent1.sources.netcat-collect.bind = 10.3.9.34
agent1.sources.netcat-collect.port = 22
agent1.sources.netcat-collect.command = tail -F /var/log/wtmp

# Describe solrSink
agent1.sinks.hdfs-write.type = hdfs
agent1.sinks.hdfs-write.hdfs.path = /user/cloudera/flume/%y-%m-%d
agent1.sinks.hdfs-write.hdfs.filePrefix = flume-%y-%m-%d
agent1.sinks.hdfs-write.hdfs.rollSize = 1048576
agent1.sinks.hdfs-write.hdfs.rollCount = 100
agent1.sinks.hdfs-write.hdfs.rollInterval = 120
agent1.sinks.hdfs-write.hdfs.writeFormat = Text
agent1.sinks.hdfs-write.hdfs.fileType = DataStream
agent1.sinks.hdfs-write.hdfs.useLocalTimeStamp = true
agent1.sinks.hdfs-write.hdfs.idleTimeout = 10

 

# Use a channel which buffers events to a file
# -- The component type name, needs to be FILE.
agent1.channels.memoryChannel.type = memory
agent1.channels.memoryChannel.capacity =10000


# Amount of time (in millis) between checkpoints
agent1.channels.memoryChannel.checkpointInterval 300000

# Max size (in bytes) of a single log file
agent1.channels.memoryChannel.maxFileSize = 2146435071

# Bind the source and sink to the channel
agent1.sources.netcat-collect.channels = memoryChannel


Below is the command to pull data from weblog to HDFS

flume-ng agent --name agent1 --conf /home/cloudera/flume/conf --conf-file /home/cloudera/flume/conf/flume.conf

avatar
If you are using the exec source to tail a file, keep in mind that it is not a very reliable source. I would suggest using the taildir source (https://archive.cloudera.com/cdh5/cdh/5/flume-ng/FlumeUserGuide.html#taildir-source) to tail files reliably.

-pd