Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to load the Logs data from Remote System to my HDFS using FLUME?

How to load the Logs data from Remote System to my HDFS using FLUME?

Hi All,

I have the logs which has been created continuously in the remote system connected through the network. Now i need to load the logs to my HDFS using Flume. What is the source configuration for this flume source to pull the Real Time Logs to my HDFS.

I tried this link,

https://github.com/keedio/flume-ftp-source

How to proceed further?

I am struck in the middle.

7 REPLIES 7

Re: How to load the Logs data from Remote System to my HDFS using FLUME?

Expert Contributor
@Magesh Kumar

I have not used this source, but what error are you seeing in flume logs? There is a example flume conf here

https://github.com/keedio/flume-ftp-source/blob/flume_ftp_dev/src/main/resources/example-configs/flu...

Re: How to load the Logs data from Remote System to my HDFS using FLUME?

@Avijeet Dash

Hi Avijeet Dash,

My source is a text file with streaming data in every milliseconds and i need to transfer that data from remote to my HDFS using Flume. I have the username and password and i dont the exact configuration that are required to transfer the data using Flume to my HDFS.

It shows that Flume has started after that its not proceeding further and i could not locate the logs also.

Please do the needful.

Re: How to load the Logs data from Remote System to my HDFS using FLUME?

Expert Contributor

@Magesh Kumar,

3 Options here:

1) you can mount any remote ftp/share to local linux folder, f.e. -

https://linuxconfig.org/mount-remote-ftp-directory-host-locally-into-linux-filesystem

and then use flume exec source with tail command or spool directory source.

2) Install 2 flume agents:

- on remote host with exec/spool source and avro sink

- on hdfs/hadoop host with avro source and hdfs sink

3) Write custom source to serve your needs for any custom protocol or non-standard requirements

Re: How to load the Logs data from Remote System to my HDFS using FLUME?

Expert Contributor

Hi @Michael M , can there be solution with the rsyslog or network socket reading sources? just wondering. Thanks.

Re: How to load the Logs data from Remote System to my HDFS using FLUME?

Expert Contributor

Hi @Avijeet Dash, unfortunately i have no experience with rsyslog, but if i understand correctly is fully compatible with syslog. Flume has some integration with it - https://flume.apache.org/FlumeUserGuide.html#syslog-sources

As for network sockets, i'd say there is 2 options - use exec source with some bash command like here -

http://stackoverflow.com/questions/4283209/bash-command-to-read-from-network-socket

or, write custom source. Benefit of custom source here is more control over the process (i.e. choose between pool and stream types of the source..)

Re: How to load the Logs data from Remote System to my HDFS using FLUME?

Hi Michael,

@Michael M

Now i am trying to connect the main server in abroad using FTP server to access the real time logs data from a particular folder using FTP server.

When i tried this link i got issues with Junit Test

https://github.com/keedio/flume-ftp-source

This is the Flume Configuration to pull data

### wwww.keedio.com # example file, protocol is ftp, process by lines, and sink to file_roll # for testing purposes.

## Sources Definition for agent "agent" #ACTIVE LIST agent.sources = ftp1 agent.sinks = k1 agent.channels = ch1

##### SOURCE IS ftp server

# Type of source for ftp sources agent.sources.ftp1.type = org.keedio.flume.source.ftp.source.Source agent.sources.ftp1.client.source = ftp

# Connection properties for ftp server agent.sources.ftp1.name.server = 192.168.2.3 agent.sources.ftp1.port = 21

agent.sources.ftp1.user = admin

agent.sources.ftp1.password = admin321

agent.sources.ftp1.folder =D:\data\<files> agent.sources.ftp1.file.name = filename

# Discover delay, each configured millisecond directory will be explored agent.sources.ftp1.run.discover.delay=5000

# Process by lines agent.sources.ftp1.flushlines = true

agent.sinks.k1.type = file_roll agent.sinks.k1.sink.directory = /streamingdata/ agent.sinks.k1.sink.rollInterval = 7200

agent.channels.ch1.type = memory agent.channels.ch1.capacity = 10000 agent.channels.ch1.transactionCapacity = 1000

agent.sources.ftp1.channels = ch1 agent.sinks.k1.channel = ch1

Please check whether this configuration is good enough

Here Source is a Windows Server and Sink is Linux HDFS.

Please do the needful.

Re: How to load the Logs data from Remote System to my HDFS using FLUME?

Expert Contributor

@Magesh Kumar, is hard to say what's happening without the flume logs..

However 2 comments about your config:

- you're using file_roll sink, not HDFS

- from what i understand that source consumes root folder of ftp server. .folder and .name parameters have another purpose.

Don't have an account?
Coming from Hortonworks? Activate your account here