I am trying to get Pig set up on our cluster to process data in Avro files. I've been trying to follow the instructions here:
Out of the box, the REGISTER statements all fail because they cannot find the specified jars. On the command line I find I can either:
- change the REGISTER commands to something of the form REGISTER
- add /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/pig and
/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/pig/lib to the PIG_CLASSPATH environment variable
If I do the latter, it doesn't seem that I even need to register the jars anymore.
However, I can't seem to find any way to get the same effect when running through Hue and Oozie. The only way to get Pig to find the jars is by specifying the full path.
Do you guys know any way I can either add a bunch of jars to the classpath when running through Hue/Oozie or alternative set a set of search paths for the REGISTER statement?
Full paths work OK I guess, but I'm trying to get a bunch of non-programmer data scientists using this thing, and long complex paths are going to be very error prone.