Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

when need to set Block replication to 1

avatar

we get the following in spark logs

java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage DatanodeInfoWithStorage\
The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:1036) 

my ambari cluster include only 3 workers machines and each worker have only one data disk

I search in google and find solution can be about:

Block replication need to be set as 1 instead of 3 ( HDFS )

is it true ?

second - because my worker machine have obnly one data disk is it can be part of the problem ?

Block replication = The total number of files in the file system will be what's specified in the dfs.replication factor setting dfs.replication=1, means will be only one copy of the file in the file system.

Michael-Bronson
1 ACCEPTED SOLUTION

avatar

1. Block replication if for redundancy of data which ensures data is not lost due to bad disk or node going down.
2. Replication 1 is set in situation when data can recreated at any point of time, the loss of data is not crucial. Like a job chain, output of one job is consumed by others and ebntually all intermediate data needs to be deleted. The intermediate data can be marked for Replication of 1 ( Still its good to have 2 )
3. Replication factor of 1 makes the cluster fault tolerant.

In you case you have 3 worker node, RF of 1 means if a worker is bad, you loose data and the it cant be processed.
I suggest you to use at RF=2 if you are concerned about space utilization.

View solution in original post

1 REPLY 1

avatar

1. Block replication if for redundancy of data which ensures data is not lost due to bad disk or node going down.
2. Replication 1 is set in situation when data can recreated at any point of time, the loss of data is not crucial. Like a job chain, output of one job is consumed by others and ebntually all intermediate data needs to be deleted. The intermediate data can be marked for Replication of 1 ( Still its good to have 2 )
3. Replication factor of 1 makes the cluster fault tolerant.

In you case you have 3 worker node, RF of 1 means if a worker is bad, you loose data and the it cant be processed.
I suggest you to use at RF=2 if you are concerned about space utilization.