Created 03-20-2016 11:35 PM
In practice exam I got classcast exception. Looks like TaggedInputSplit is not even a public class. How to get the filename when when using MultipleInputs?
Mapper setup method:
Path path = ((FileSplit) split).getPath();
Driver class:
MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class, FlightDataMapper.class); MultipleInputs.addInputPath(job, new Path(args[2]), TextInputFormat.class, WeatherDataMapper.class);
Exception:
java.lang.ClassCastException: org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit cannot be cast to org.apache.hadoop.mapreduce.lib.input.FileSplit
Any help is greatly appreciated.
Thanks,
Sanjay
Created 03-21-2016 12:21 AM
An easier solution would be to read every row, and then add a check to see if it's a header row by checking one of the columns that you know only appears in a header row.
Created 03-21-2016 12:03 AM
I'm not sure why you are getting that exception, but I do know that you do not need to get the path of any input files on the exam. What are you trying to do? Showing more of your code might help me to provide more insight.
Created 03-21-2016 12:16 AM
There is header in one of the CSV files and I was trying to ignore the first record. Below is my sample code.
setup() {
Path path = ((FileSplit) split).getPath();
filename = path.getName();
if (filename.equals("flightdata1.csv") {
hasheader = true;
}
}
map(....) {
if (hasheader) {
if (key.get() == 0) return;
}
......
......
}
Created 03-21-2016 12:21 AM
An easier solution would be to read every row, and then add a check to see if it's a header row by checking one of the columns that you know only appears in a header row.
Created 03-21-2016 12:35 AM
Thanks for your response @Rich Raposa
I eventually took a simpler approach to finish the exam (although 15 mins late 🙂 )
I am wondering why using MultipleInputs givesTaggedInputSplit object and not FileSplit.
Created 03-21-2016 12:40 AM
There's a discussion here that answers your question.
http://stackoverflow.com/questions/11130145/hadoop-multipleinputs-fails-with-classcastexception