Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Issue with Primary Accumulo site during Accumulo Replication

Issue with Primary Accumulo site during Accumulo Replication

New Contributor

We are looking for guidance on the Accumulo Replication feature documented here (https://accumulo.apache.org/1.7/accumulo_user_manual.html#_replication).  

 

We set up replication between a primary site A and two destination sites B and C.  What we are seeing is that if either of the destination sites B or C are down (powered down) or

not functioning correctly (Zookeeper on site C was down because the hard drive was full on some of these nodes), that we were quickly seeing issues with the primary site (site A).

Based on the documentation, we did not anticipate seeing issues with site A.  We thought the walogs would have built up on site A over time and that when the issues with the destination

sites were resolved, that these logs would flush.  

 

We saw a significant degradation in the primary Accumulo cluster on the source site (site A).  In the tablet server logs we saw it continuously trying to connect to the zookeeper ensemble

at the downed destination sites.  Here are the modifications we made to the Accumulo replication settings:

 

Accumulo Replication Configuration Changes:

  -- decrease replication.worker.threads from 4 to 1

  -- decrease replication.work.attempts from 10 to 1

  -- increase replication.work.assignment.sleep from 30s to 30m

 

Thoughts? Thanks in Advance!

Don't have an account?
Coming from Hortonworks? Activate your account here