Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Impact of reducing HDFS replication factor to 2 (or just one) on HBase map/reduce performance

Highlighted

Impact of reducing HDFS replication factor to 2 (or just one) on HBase map/reduce performance

New Contributor

What is the impact of reducing HDFS replication factor to 2 (or just one) on HBase map/reduce performance ? I am having a HBase cluster hosted on Azure VMs with data stored in azure managed disks. Azure managed disk itself keeps 3 copies of the data for fault tolerance, so thinking of reducing the HDFS replication factor to save on storage overhead. Given that map reduce jobs make use of local availability of the data to avoid data transfer over network, wondering anyone has any information on the impact on map reduce performance if there just one replica of the data available?