Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Replicate file to all DataNodes

Replicate file to all DataNodes

New Contributor

Hi,

 

To improve the performance of a dataset access I would like to replicate the blocks of the file to all datanodes. It's a dimension dataset. One way would be setting the replication factor to a number higher than the number of datanodes, but I would like to know if there is a better way to do this.

 

Does anyone already did something like this?

 

 

2 REPLIES 2

Re: Replicate file to all DataNodes

Master Guru
The approach you describe is a good way to get such a thing done. For
alternatives' sake, you could also load the file paths into the application
distributed cache, which will cause every NodeManager to download and keep
a local copy of it during container executions. This isn't a good idea for
very large files.

Re: Replicate file to all DataNodes

New Contributor
Thanks!