I have Hadoop cluster.
I want collect logs and I use Flume (syslog source).
But for HA, I up 2 instances of Flume and send all logs on all instances.
I use Hive Sink. (partition by field date from log)
How I can resolve problem with duplicate logs?
What are the possible solutions except deduplicate after or use kafka?