08-07-2017 08:45 AM
We are stressing the Kudu cluster (inserting a lot of information) and we are getting errors of timeouts when inserting the data in the tablets:
W0807 12:53:47.136150 31391 meta_cache.cc:207] Tablet d687c05ffe5e48d19fbfe2f71bd136f7: Replica 0cf3c1866a094ee0b2305bca770f5e70 (bigdata09dev:7050) has failed: Timed out: Write RPC to 192.168.10.124:7050 timed out after 9.989s (SENT) W0807 12:53:47.136211 31391 batcher.cc:329] Timed out: Failed to write batch of 805 ops to tablet d687c05ffe5e48d19fbfe2f71bd136f7 after 1 attempt(s): Failed to write to server: 0cf3c1866a094ee0b2305bca770f5e70 (bigdata09dev:7050): Write RPC to 192.168.10.124:7050 timed out after 9.989s (SENT)
This is causing data loss. My question is: Is the only option to avoid this (avoid data loss) to control the errors by software when programming the loader and retrying the insert? Or is it possible to configure the cluster to retry the insert by default until it gets loaded?
Thank you very much and best regards
08-07-2017 11:24 AM
08-08-2017 08:52 AM