Support Questions
Find answers, ask questions, and share your expertise

FLUME .. External server to hdfs cluster -- Need help here

Highlighted

FLUME .. External server to hdfs cluster -- Need help here

Expert Contributor

I had a requirement to transfer the logs from external server to hdfs cluster . External server has username and password to access log file path.

Please suggest below thing :

1) Do we require to install Flume in Source (i.e. External server) .

2) How we can pass credential in conf file to get the data from External server.

Any help will be greatly apprcieated !!

11 REPLIES 11
Highlighted

Re: FLUME .. External server to hdfs cluster -- Need help here

Super Guru
Highlighted

Re: FLUME .. External server to hdfs cluster -- Need help here

Expert Contributor

Thanks . But this link does not have details that which properties of sources we have to use to pass username credentials . Can you please elloborate that how we can do this

Highlighted

Re: FLUME .. External server to hdfs cluster -- Need help here

Expert Contributor

There is no built-in support of such feature.

I'd recommend to map remote directory to the server with flume agent running using smth like samba share (or windows network drive) with your credentials.

If it's not possible and you're using some custom protocols to access the files then you have to write your custom source to support that.

Here is example of FTP source with creds support: https://github.com/keedio/flume-ftp-source

Highlighted

Re: FLUME .. External server to hdfs cluster -- Need help here

Explorer

Hi Michael,

Could you explain the purpose/advantage of using flume-ftp over ftp. Is it not possible to perform a sftp from remote server to hdfs? Thank you.

Highlighted

Re: FLUME .. External server to hdfs cluster -- Need help here

Expert Contributor
@Revathy Mourouguessane

I mentioned FTP source just as example of custom protocol implementation in case is not possible to mount as linux folder.

Is just a matter of environment configuration. Shared folders/custom mount points in linux are usually managed by admins, and in this case it just adds one more point to keep an eye on - to have those folders correctly mapped/mounted before you start running the flume.

If you can mount sftp to your hadoop node as local folder - go with it. There will be no difference in term of flume process

Highlighted

Re: FLUME .. External server to hdfs cluster -- Need help here

Expert Contributor

Hi @Michael M

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

What this capacity and TransactionCapacity means?

Highlighted

Re: FLUME .. External server to hdfs cluster -- Need help here

Expert Contributor

I am getting data truncated .

Highlighted

Re: FLUME .. External server to hdfs cluster -- Need help here

Expert Contributor

Hi @Amit Dass

As per Flume doc:

capacity The maximum number of events stored in the channel
transactionCapacity The maximum number of events the channel will take from a source or give to a sink per transaction

In your case this means that your channel can store up to 1000 events. Source will send the events to the channel in batches with up to 100 events per each transaction. As well as sink will consume up to 100 events per batch/transaction.

Transaction here means as everywhere. If something goes wrong - whole transaction will be rolled back and all 100 events will return back to the channel.

If your sink can't save the events during some period of time, your channel will be overflowed and flume will throw an error.

What do you mean saying "data truncated"?

I'd say is not a problem of the channel configuration, but we can check if you provide more details

Highlighted

Re: FLUME .. External server to hdfs cluster -- Need help here

Expert Contributor

Hi @Michael M

Thanks you for such a brief explanation .

When you are saying event is it means no of records fetch from the source ? Like if we have a file with 5 records then we can say capacity = 5 & transactionCapacity = 5 .