Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Disadvantages of replication factor 1 on 200GB of data per day

avatar
Expert Contributor

Hi,

I have data coming in about 200 GB per day from Cassandra database into hdfs.... what are the disadvantages especially when the replication factor is 1 other than losing the data when the datanode fails....

I believe there will be lot of pressure on that node where the data exists ? I am trying to understand what happens during querying large chunks of data from these data nodes with rep factor set to 1.

Thanks.

1 ACCEPTED SOLUTION

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
2 REPLIES 2

avatar
Rising Star

@PJ Even after setting replication factor as 1 the data would be split into blocks and would be distributed across different datanodes. So, incase of a datanode failure you will only be able to partially retrieve data. Other advantage of setting replication factor > 1 is parallel processing, i.e. you have multiple copies of data at multiple places and all the machines can simultaneously process data.

avatar
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login