Support Questions

NerdcoreSteve · ‎09-15-2013

I apologize if this is not correct. I am taking a Big Data course and am trying to get my first hadoop program running. I have the file I want to look at uploaded to the hdfs. I can see it at http://localhost:50075 when I browse the directory. I set the input path for my mapper like so:

FileInputFormat.addInputPath(job1, new Path(args[0]));

with args as either "hdfs::/localhost:9000/path/to/file" or "path/to/file" I get the same result.

I get a lot of these:

"java.lang.IllegalArgumentException: Wrong FS: hdfs://localhost:50075/path/to/file, expected: hdfs://localhost:9000"

At the end I get this:

"13/09/15 22:22:32 INFO mapred.JobClient: Job complete: job_201309151109_0016
13/09/15 22:22:32 INFO mapred.JobClient: Counters: 7
13/09/15 22:22:32 INFO mapred.JobClient:   Job Counters
13/09/15 22:22:32 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=109261
13/09/15 22:22:32 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/09/15 22:22:32 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/09/15 22:22:32 INFO mapred.JobClient:     Launched map tasks=8
13/09/15 22:22:32 INFO mapred.JobClient:     Data-local map tasks=8
13/09/15 22:22:32 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
13/09/15 22:22:32 INFO mapred.JobClient:     Failed map tasks=1"

What am I doing wrong?

NerdcoreSteve · ‎09-17-2013

I just needed my second job mapper to output 1 as the key and the file line as the value and take care of things in the reducer. IT WORKS!!! That was a lot of work. Now onto part 2 of 3. (due tomorrow, I'm hopefull though 🙂 )

View solution in original post

clouderamovies · ‎09-16-2013

I'd be interested to see the full output from running with the /path/ti/file varient. I'd also be interested to see the result of running "hadoop fs -ls path/to/file".

Floyd Bush
VP Marketing
Cloudera Movies

NerdcoreSteve · ‎09-17-2013

My professor helped me with this one. My assignment requires multiple jobs and I had a hard-coded absolute path for FileOutputFormat.setOutputPath(). I suppose an absolute path would work if I got it right, but a relative path works. So now my first job executes! Now I just need to figure out what's wrong with my second, but I think that is much more a straight-forward programming problem.

NerdcoreSteve · ‎09-17-2013

Now I have to output a single top 20 list from the reducer. It seems there are multiple reducers. How can I limit it to just one? I can't change the configuration since my professor will be running my code on his own hadoop installation.

NerdcoreSteve · ‎09-17-2013

I just needed my second job mapper to output 1 as the key and the file line as the value and take care of things in the reducer. IT WORKS!!! That was a lot of work. Now onto part 2 of 3. (due tomorrow, I'm hopefull though 🙂 )

NerdcoreSteve · ‎09-17-2013

Now if only I could read the ouput of a previous job in the setup of another....

NerdcoreSteve · ‎09-17-2013

I got it! But it seems overly complicated to grab one crummy line from a file...

375             String dateString = "";
376             FileSystem fileSystem = FileSystem.get(configuration);
377             FileStatus[] fileStatus = fileSystem.listStatus(new Path("/temp/query2job2temp"));
378             for (FileStatus status : fileStatus) {
379                 Path path = status.getPath();
380                 if(path.toString().matches(".*part.*")) {
381                     BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(fileSystem.open(path)));
382                     dateString = bufferedReader.readLine();
383                     Pattern pattern = Pattern.compile("([0-9]{2}/[0-9]{2}/[0-9]{2})");
384                     Matcher matcher = pattern.matcher(dateString);
385                     if(matcher.find()) {
386                         dateString = matcher.group(0);    
387                     }
388                     bufferedReader.close();
389                 }
390             }

Cloudera Community

Support Questions

It was suggested a non-cloudera hadoop user could get some help with homework here