- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
HDP Certified Java Developer practice exam
- Labels:
-
Apache Hadoop
Created 03-20-2016 11:35 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In practice exam I got classcast exception. Looks like TaggedInputSplit is not even a public class. How to get the filename when when using MultipleInputs?
Mapper setup method:
Path path = ((FileSplit) split).getPath();
Driver class:
MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class, FlightDataMapper.class); MultipleInputs.addInputPath(job, new Path(args[2]), TextInputFormat.class, WeatherDataMapper.class);
Exception:
java.lang.ClassCastException: org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit cannot be cast to org.apache.hadoop.mapreduce.lib.input.FileSplit
Any help is greatly appreciated.
Thanks,
Sanjay
Created 03-21-2016 12:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
An easier solution would be to read every row, and then add a check to see if it's a header row by checking one of the columns that you know only appears in a header row.
Created 03-21-2016 12:03 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm not sure why you are getting that exception, but I do know that you do not need to get the path of any input files on the exam. What are you trying to do? Showing more of your code might help me to provide more insight.
Created 03-21-2016 12:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is header in one of the CSV files and I was trying to ignore the first record. Below is my sample code.
setup() {
Path path = ((FileSplit) split).getPath();
filename = path.getName();
if (filename.equals("flightdata1.csv") {
hasheader = true;
}
}
map(....) {
if (hasheader) {
if (key.get() == 0) return;
}
......
......
}
Created 03-21-2016 12:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
An easier solution would be to read every row, and then add a check to see if it's a header row by checking one of the columns that you know only appears in a header row.
Created 03-21-2016 12:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your response @Rich Raposa
I eventually took a simpler approach to finish the exam (although 15 mins late 🙂 )
I am wondering why using MultipleInputs givesTaggedInputSplit object and not FileSplit.
Created 03-21-2016 12:40 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There's a discussion here that answers your question.
http://stackoverflow.com/questions/11130145/hadoop-multipleinputs-fails-with-classcastexception
