Support Questions
Find answers, ask questions, and share your expertise

Replicate file to all DataNodes

New Contributor

Hi,

 

To improve the performance of a dataset access I would like to replicate the blocks of the file to all datanodes. It's a dimension dataset. One way would be setting the replication factor to a number higher than the number of datanodes, but I would like to know if there is a better way to do this.

 

Does anyone already did something like this?

 

 

2 REPLIES 2

Re: Replicate file to all DataNodes

Master Guru
The approach you describe is a good way to get such a thing done. For
alternatives' sake, you could also load the file paths into the application
distributed cache, which will cause every NodeManager to download and keep
a local copy of it during container executions. This isn't a good idea for
very large files.

Re: Replicate file to all DataNodes

New Contributor
Thanks!