Hi,
I was used druid since 2 months for real time ingestion from a kafka topic.
I did lot of test :
- I consume my topic from begin with spark streaming and use tranquility for real time ingestion. My topic contain lot of message and when i consume from begin i have more running task real time in druid and it's can't be done except i configure my window period and i loss more message
- I try with a kafka connect druid but i have same problem
I read lot of article about druid and i understand that for historical data the best way is to do batch ingstion. I start with index batch ingestion and then with hadoop batch ingestion. Any of this method work with me unless i am not well using batch ingestion configuration.
Please for people that work long time with druid how can i resolve batch ingestion problem?
Thanks in advance