I've spent a lot of time over the past week trying to get regex tales to work in Hive. The table would create fine, but whenever I ran a query that required map reduce, the job would blow up. The issue is that for some reason core and task nodes don't have the jar file that contains the org.apache.hadoop.hive.contrib.serde2.RegexSerDe class in the classpath. The same basic issue exists on AWS EMR. I'm not sure if there is an issue with the way I'm using the class or if this is a bug in the configuration. My solution on CDH5 is below. On EMR, I had to create a script to run as a bootstrap action that copied the jar to the appropriate path (needed copy for other jars so decided to go this way).
Please comment if you have any suggestions or other solutions.
Thanks for the suggestion. I thought about add jar, but reminding all users to add the jar was problematic, and then getting that solution to work with ODBC had me stumped.
I guess my main question is why do we have to do this at all? Since this is fundamental Hive functionality, it would seem like the jar would be available to all nodes. Further, making it available only to the master is a curious configuration. Is there some reason why I would not want this jar in the lib directory all the time?
brock, thanks again for the reply. I admit to not being entirely convinced - I don't see why I should have to fiddle with jars to finish configuring a cluster in order to use basic functionality. BUT, I do appreciate your time, and the fact that you validated the feature.