I have Hadoop cluster. I want collect logs and I use Flume (syslog source). But for HA, I up 2 instances of Flume and send all logs on all instances. I use Hive Sink. (partition by field date from log) How I can resolve problem with duplicate logs? What are the possible solutions except deduplicate after or use kafka?
... View more