Support Questions
Find answers, ask questions, and share your expertise

Flume - HDFS HA

Champion Alumni

Hello,

 

I'm searching for the recommanded configuration for Flume - HDFS Sink when we are using HDFS in HA. 

 

In fact, each time that we restart the cluster/ the nodename fails the active nodename changes and flume fails since is

asking informations on the standby node. 

 

Thank you!

 

Alina

GHERMAN Alina
1 ACCEPTED SOLUTION

Master Guru
What form of HDFS path are you configuring in your Flume agent configs?

For HA, you must use the HA service name, such as
hdfs://nameservice1/user/foo instead of
hdfs://namenode-host:8020/user/foo. This will protect your agents from
failures during HA failovers.

View solution in original post

4 REPLIES 4

Master Guru
What form of HDFS path are you configuring in your Flume agent configs?

For HA, you must use the HA service name, such as
hdfs://nameservice1/user/foo instead of
hdfs://namenode-host:8020/user/foo. This will protect your agents from
failures during HA failovers.

Explorer
This is not useful for a remote hdfs clusters... Is possible to user webhdfs from flume?

Master Guru
For remote HDFS clusters, just ensure to define the required namespace resolving configuration in your HDFS Gateway hdfs-site.xml configuration. Then in Flume you can use the remote namespace defined name. See http://community.cloudera.com/t5/Storage-Random-Access-HDFS/distcp-with-same-nameservicename/m-p/493... for more details on how to define this.

Explorer

Yes, you can download the hdfs client configuration from Cloudera Manager, but this is not possible always, when you are working on different department or any bureaucratic issue... And if you make any change HDFS configuration, you must download this configuration again. Is not a scalable solution in a big environments, the best solution is working on the same cluster (gateway if is possible), but if you have an external Flume Agents, there no exist a properly and scalable solution I think.

; ;