Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Distributed cache files retrieval

Distributed cache files retrieval

New Contributor

Hi there,

  we are using Hadoop 2.0.0-cdh4.4.0 in our company. I am trying to use distributed cache feature. Getting the files from cache in mapper/reducer involves one of these methods:

  context.getLocalCacheFiles();
  context.getCacheFiles();
  DistributedCache.getLocalCacheFiles();
  DistributedCache.getCacheFiles();

  Each of them returns Path[] or URI[]. Sometimes you need to store more then one files. The problem is that you need to be able to say which one is which. Example:

  job.addCacheFiles(new Path("/dir/setA.txt"));
  job.addCacheFiles(new Path("/dir/setB.txt"));
 
  URI[] uris = context.getCacheFiles();
  //uris[0] - setA or setB?

  Thank you in advance!

  Jakub

edit: moreover I have found out that method job.addCacheFiles which is only non deprecated for adding files to cache gives me NoSuchMethodException on server even though server cdh version and maven dependencies are of same version 2.0.0-cdh4.4.0. and maven builds it without error. I am going to read it directly from hdfs for now...

2 REPLIES 2

Re: Distributed cache files retrieval

Master Collaborator

I have moved this thread to the Mapreduce board in hopes someone here can assist you.

Re: Distributed cache files retrieval

New Contributor

Thank you. I hope there is someone using this feature and willing to help at the same time :)