Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Flume - HDFS HA

avatar
Champion Alumni

Hello,

 

I'm searching for the recommanded configuration for Flume - HDFS Sink when we are using HDFS in HA. 

 

In fact, each time that we restart the cluster/ the nodename fails the active nodename changes and flume fails since is

asking informations on the standby node. 

 

Thank you!

 

Alina

GHERMAN Alina
1 ACCEPTED SOLUTION

avatar
Mentor
What form of HDFS path are you configuring in your Flume agent configs?

For HA, you must use the HA service name, such as
hdfs://nameservice1/user/foo instead of
hdfs://namenode-host:8020/user/foo. This will protect your agents from
failures during HA failovers.

View solution in original post

4 REPLIES 4

avatar
Mentor
What form of HDFS path are you configuring in your Flume agent configs?

For HA, you must use the HA service name, such as
hdfs://nameservice1/user/foo instead of
hdfs://namenode-host:8020/user/foo. This will protect your agents from
failures during HA failovers.

avatar
Contributor
This is not useful for a remote hdfs clusters... Is possible to user webhdfs from flume?

avatar
Mentor
For remote HDFS clusters, just ensure to define the required namespace resolving configuration in your HDFS Gateway hdfs-site.xml configuration. Then in Flume you can use the remote namespace defined name. See http://community.cloudera.com/t5/Storage-Random-Access-HDFS/distcp-with-same-nameservicename/m-p/493... for more details on how to define this.

avatar
Contributor

Yes, you can download the hdfs client configuration from Cloudera Manager, but this is not possible always, when you are working on different department or any bureaucratic issue... And if you make any change HDFS configuration, you must download this configuration again. Is not a scalable solution in a big environments, the best solution is working on the same cluster (gateway if is possible), but if you have an external Flume Agents, there no exist a properly and scalable solution I think.