Created on 09-15-2013 09:25 PM - edited 09-16-2022 01:47 AM
I apologize if this is not correct. I am taking a Big Data course and am trying to get my first hadoop program running. I have the file I want to look at uploaded to the hdfs. I can see it at http://localhost:50075 when I browse the directory. I set the input path for my mapper like so:
FileInputFormat.addInputPath(job1, new Path(args[0]));
with args as either "hdfs::/localhost:9000/path/to/file" or "path/to/file" I get the same result.
I get a lot of these:
"java.lang.IllegalArgumentException: Wrong FS: hdfs://localhost:50075/path/to/file, expected: hdfs://localhost:9000"
At the end I get this:
"13/09/15 22:22:32 INFO mapred.JobClient: Job complete: job_201309151109_0016
13/09/15 22:22:32 INFO mapred.JobClient: Counters: 7
13/09/15 22:22:32 INFO mapred.JobClient: Job Counters
13/09/15 22:22:32 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=109261
13/09/15 22:22:32 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/09/15 22:22:32 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/09/15 22:22:32 INFO mapred.JobClient: Launched map tasks=8
13/09/15 22:22:32 INFO mapred.JobClient: Data-local map tasks=8
13/09/15 22:22:32 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/09/15 22:22:32 INFO mapred.JobClient: Failed map tasks=1"
What am I doing wrong?
Created 09-17-2013 11:08 AM
I just needed my second job mapper to output 1 as the key and the file line as the value and take care of things in the reducer. IT WORKS!!! That was a lot of work. Now onto part 2 of 3. (due tomorrow, I'm hopefull though 🙂 )
Created 09-16-2013 04:43 PM
I'd be interested to see the full output from running with the /path/ti/file varient. I'd also be interested to see the result of running "hadoop fs -ls path/to/file".
Created 09-17-2013 08:32 AM
My professor helped me with this one. My assignment requires multiple jobs and I had a hard-coded absolute path for FileOutputFormat.setOutputPath(). I suppose an absolute path would work if I got it right, but a relative path works. So now my first job executes! Now I just need to figure out what's wrong with my second, but I think that is much more a straight-forward programming problem.
Created 09-17-2013 10:49 AM
Now I have to output a single top 20 list from the reducer. It seems there are multiple reducers. How can I limit it to just one? I can't change the configuration since my professor will be running my code on his own hadoop installation.
Created 09-17-2013 11:08 AM
I just needed my second job mapper to output 1 as the key and the file line as the value and take care of things in the reducer. IT WORKS!!! That was a lot of work. Now onto part 2 of 3. (due tomorrow, I'm hopefull though 🙂 )
Created 09-17-2013 01:15 PM
Now if only I could read the ouput of a previous job in the setup of another....
Created 09-17-2013 02:30 PM
I got it! But it seems overly complicated to grab one crummy line from a file...
375 String dateString = ""; 376 FileSystem fileSystem = FileSystem.get(configuration); 377 FileStatus[] fileStatus = fileSystem.listStatus(new Path("/temp/query2job2temp")); 378 for (FileStatus status : fileStatus) { 379 Path path = status.getPath(); 380 if(path.toString().matches(".*part.*")) { 381 BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(fileSystem.open(path))); 382 dateString = bufferedReader.readLine(); 383 Pattern pattern = Pattern.compile("([0-9]{2}/[0-9]{2}/[0-9]{2})"); 384 Matcher matcher = pattern.matcher(dateString); 385 if(matcher.find()) { 386 dateString = matcher.group(0); 387 } 388 bufferedReader.close(); 389 } 390 }