Created on 07-12-2016 10:54 AM - edited 09-16-2022 03:29 AM
Hi Team,
I am using cloudera VM with CDH5.5.0.
I am trying to pull weblog data using flume from /var/log/wtmp at ip address 10.3.9.34 at port 22.Let me inform i did ssh root@10.3.9.34 from command prompt of CDH5.5 and i was able to connect to this weblog ipaddress
I am trying to pull weblog from this ipaddress and put that weblog into hdfs path /user/cloudera/flume/ so i ran below flume-ng command :-
flume-ng agent --name agent1 --conf /home/cloudera/flume/conf --conf -file /home/cloudera/flume/conf/flume.conf
Problem is i am getting Fatal error as "java.lang.NullPointerException" while Import
Below is my flume.conf details :-
agent1.sources = netcat-collect
agent1.sinks = hdfs-write
agent1.channels = memory
# Describe/configure source1
agent1.sources.netcat-collect.type = netcat
agent1.sources.netcat-collect.bind = 10.3.9.34
agent1.sources.netcat-collect.port = 22
agent1.sources.netcat-collect.command = tail -F /var/log/wtmp
# Describe solrSink
agent1.sinks.hdfs-write.type = hdfs
agent1.sinks.hdfs-write.hdfs.path = /user/cloudera/flume/%y-%m-%d
agent1.sinks.hdfs-write.hdfs.filePrefix = flume-%y-%m-%d
agent1.sinks.hdfs-write.hdfs.rollSize = 1048576
agent1.sinks.hdfs-write.hdfs.rollCount = 100
agent1.sinks.hdfs-write.hdfs.rollInterval = 120
agent1.sinks.hdfs-write.hdfs.writeFormat = Text
agent1.sinks.hdfs-write.hdfs.fileType = DataStream
agent1.sinks.hdfs-write.hdfs.useLocalTimeStamp = true
agent1.sinks.hdfs-write.hdfs.idleTimeout = 10
# Use a channel which buffers events to a file
# -- The component type name, needs to be FILE.
agent1.channels.memoryChannel.type = memory
agent1.channels.memoryChannel.capacity =10000
# Amount of time (in millis) between checkpoints
agent1.channels.memoryChannel.checkpointInterval 3000
# Max size (in bytes) of a single log file
agent1.channels.memoryChannel.maxFileSize = 2146435071
# Bind the source and sink to the channel
agent1.sources.netcat-collect.channels = memoryChannel
agent1.sinks.hdfs-write.channel = memoryChannel
Execution log attached with this thread
https://drive.google.com/file/d/0B7FLyvHGgEJaYnM2d3JfRXMwNEU/view?usp=sharing
Can someone help me in guiding what is the resolution
Created 07-14-2016 12:57 PM
Thank you this got solved by below configuration and command
agent1.sources = netcat-collect
agent1.sinks = hdfs-write
agent1.channels = memoryChannel
# Describe/configure source1
agent1.sources.netcat-collect.type = exec
agent1.sources.netcat-collect.bind = 10.3.9.34
agent1.sources.netcat-collect.port = 22
agent1.sources.netcat-collect.command = tail -F /var/log/wtmp
# Describe solrSink
agent1.sinks.hdfs-write.type = hdfs
agent1.sinks.hdfs-write.hdfs.path = /user/cloudera/flume/%y-%m-%d
agent1.sinks.hdfs-write.hdfs.filePrefix = flume-%y-%m-%d
agent1.sinks.hdfs-write.hdfs.rollSize = 1048576
agent1.sinks.hdfs-write.hdfs.rollCount = 100
agent1.sinks.hdfs-write.hdfs.rollInterval = 120
agent1.sinks.hdfs-write.hdfs.writeFormat = Text
agent1.sinks.hdfs-write.hdfs.fileType = DataStream
agent1.sinks.hdfs-write.hdfs.useLocalTimeStamp = true
agent1.sinks.hdfs-write.hdfs.idleTimeout = 10
# Use a channel which buffers events to a file
# -- The component type name, needs to be FILE.
agent1.channels.memoryChannel.type = memory
agent1.channels.memoryChannel.capacity =10000
# Amount of time (in millis) between checkpoints
agent1.channels.memoryChannel.checkpointInterval 300000
# Max size (in bytes) of a single log file
agent1.channels.memoryChannel.maxFileSize = 2146435071
# Bind the source and sink to the channel
agent1.sources.netcat-collect.channels = memoryChannel
Below is the command to pull data from weblog to HDFS
flume-ng agent --name agent1 --conf /home/cloudera/flume/conf --conf-file /home/cloudera/flume/conf/flume.conf
Created 07-14-2016 12:57 PM
Thank you this got solved by below configuration and command
agent1.sources = netcat-collect
agent1.sinks = hdfs-write
agent1.channels = memoryChannel
# Describe/configure source1
agent1.sources.netcat-collect.type = exec
agent1.sources.netcat-collect.bind = 10.3.9.34
agent1.sources.netcat-collect.port = 22
agent1.sources.netcat-collect.command = tail -F /var/log/wtmp
# Describe solrSink
agent1.sinks.hdfs-write.type = hdfs
agent1.sinks.hdfs-write.hdfs.path = /user/cloudera/flume/%y-%m-%d
agent1.sinks.hdfs-write.hdfs.filePrefix = flume-%y-%m-%d
agent1.sinks.hdfs-write.hdfs.rollSize = 1048576
agent1.sinks.hdfs-write.hdfs.rollCount = 100
agent1.sinks.hdfs-write.hdfs.rollInterval = 120
agent1.sinks.hdfs-write.hdfs.writeFormat = Text
agent1.sinks.hdfs-write.hdfs.fileType = DataStream
agent1.sinks.hdfs-write.hdfs.useLocalTimeStamp = true
agent1.sinks.hdfs-write.hdfs.idleTimeout = 10
# Use a channel which buffers events to a file
# -- The component type name, needs to be FILE.
agent1.channels.memoryChannel.type = memory
agent1.channels.memoryChannel.capacity =10000
# Amount of time (in millis) between checkpoints
agent1.channels.memoryChannel.checkpointInterval 300000
# Max size (in bytes) of a single log file
agent1.channels.memoryChannel.maxFileSize = 2146435071
# Bind the source and sink to the channel
agent1.sources.netcat-collect.channels = memoryChannel
Below is the command to pull data from weblog to HDFS
flume-ng agent --name agent1 --conf /home/cloudera/flume/conf --conf-file /home/cloudera/flume/conf/flume.conf
Created 07-14-2016 02:02 PM