Reply
New Contributor
Posts: 3
Registered: ‎08-03-2017

Replicate file to all DataNodes

Hi,

 

To improve the performance of a dataset access I would like to replicate the blocks of the file to all datanodes. It's a dimension dataset. One way would be setting the replication factor to a number higher than the number of datanodes, but I would like to know if there is a better way to do this.

 

Does anyone already did something like this?

 

 

Posts: 1,567
Kudos: 289
Solutions: 240
Registered: ‎07-31-2013

Re: Replicate file to all DataNodes

The approach you describe is a good way to get such a thing done. For
alternatives' sake, you could also load the file paths into the application
distributed cache, which will cause every NodeManager to download and keep
a local copy of it during container executions. This isn't a good idea for
very large files.
Backline Customer Operations Engineer
New Contributor
Posts: 3
Registered: ‎08-03-2017

Re: Replicate file to all DataNodes

Thanks!
Announcements