Does HBase replication guarantee consistency or snapshot-in-time? HBase replication works by propagating the WAL from the source cluster to the destination and is replayed asynchronously. Thus, it is eventually consistent. For snapshot functionality, consider taking a Snapshot on the source cluster and exporting that to the destination using the ExportSnapshot tool. Is there an option of enabling replication at the GLOBAL level rather than individually at the table level? HBase replication can be enabled at the table level or column family level only Does enabling replication cause the table to be disabled? Check that the hbase.online.schema.update.enable in hbase-site.xmlis true. If so, then the table will not be disabled when turning on table replication. The default value is true so if the property is not set in hbase-site.xml, it is 'true'. Does HBase replication solve the problem of replicating a cluster which already has data to an empty cluster? Replication alone does not solve replication of pre-existing data. You will need to also use the copyTable command. Here are steps to do that: 1. Begin the replication and take note of the current timestamp: T 2. Execute copyTable and set the ending timestamp to T from step 1 HBase replication does not work correctly in case of out-of-order mutations, is this statement true? Please review the following from the Apache HBase book, it explains how there is not a guaranteed order of delivery for client edits - https://hbase.apache.org/book.html#_cluster_replication
How your application builds on top of the HBase API matters when replication is in play. HBase’s replication system provides at-least-once delivery of client edits for an enabled column family to each configured destination cluster. In the event of failure to reach a given destination, the replication system will retry sending edits in a way that might repeat a given message. Further more, there is not a guaranteed order of delivery for client edits. In the event of a RegionServer failing, recovery of the replication queue happens independent of recovery of the individual regions that server was previously handling. This means that it is possible for the not-yet-replicated edits to be serviced by a RegionServer that is currently slower to replicate than the one that handles edits from after the failure. The combination of these two properties (at-least-once delivery and the lack of message ordering) means that some destination clusters may end up in a different state if your application makes use of operations that are not idempotent, e.g. Increments.
... View more
Symptom: After adding additional configurations to Solr, the service fails with both 'no servers hosting shard' and 'OutOfMemoryError' errors. A restart will resolve the issue for some time and it will eventually hit the error again. Root Cause: In some versions, the default setting for max user processes for the infra-solr user is 1024, which may be set too low for certain use cases. Resolution: Increase the max user process user limit. This can be done by creating the file /etc/security/limits.d/infra-solr.conf on the solr node and setting the nproc property. For example if the new value chosen is 6000, the contents would be:
infra-solr - nproc 6000
... View more
Symptom: The oldWALs directory stores WALs that are no longer needed for recovery. We have come across situations where the directory continually increases in size and the older WALs are not getting cleaned up. Root Cause: This can get filled up due to a bug where hbase.backup.enable is true by default in HDP-2.5.3 Resolution: - Add the property to HBase Custom hbase-site: hbase.backup.enable=false - Restart HBase
... View more