Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Two Flume and One Hive Table (duplicate logs)


Two Flume and One Hive Table (duplicate logs)

New Contributor

I have Hadoop cluster.

I want collect logs and I use Flume (syslog source).

But for HA, I up 2 instances of Flume and send all logs on all instances.

I use Hive Sink. (partition by field date from log)


How I can resolve problem with duplicate logs?

What are the possible solutions except deduplicate after or use kafka?


Re: Two Flume and One Hive Table (duplicate logs)

Super Collaborator
Set up your HA to send to either agent, but not both. If one agent fails then all traffic should go to the other agent. This could be done with a load balancer in front of the two flume instances.

Don't have an account?
Coming from Hortonworks? Activate your account here