12-10-2014 07:22 PM
I am having trouble registering a UDF jar and I am hoping someone can walk me through the process. I'm a total newb with the Hdoop software stack and Cloudera.
I have the Cloudera 5.2 Express VM running in VirtualBox.
I have the jars in HDFS: /tmp/elephant-bird-core.4.5.jar and /tmp/elephant-bird-pig.4.5.jar
I have a pig script with 2 file resources specified; one for each of the jars above
My pig script looks like this:
register /tmp/elephant-bird-core-4.5.jar; register /tmp/elephant-bird-pig-4.5.jar; A = LOAD '/tmp/test.json' USING com.twitter.elephantbird.pig.jsonloader('-nestedLoad'); describe A;
My error looks like this:
ERROR org.apache.pig.tools.grunt.Grunt - Error 101: file '/tmp/elephant-bird-core-4.5.jar' does not exist
There are also a bunch of warning about deprecated stuff. note: I have not done any configuration of the system other than the screens which appear when the VM first boots; asking to install all the software and create a user.
Any help would be greatly appreciated... I must admit I didn't think I'd get stuck at step #1 of my exploration of Cloudera and Hadoop.
12-11-2014 02:59 AM
All I can say is programming really gives me a headache sometimes. For those who run into the same problem as I have the solution is to type the class name with the same capitalisation as is in the source code of the UDF. The correction which gets it all working is JsonLoader instead of what I typed originally: jsonloader
the correct code is:
A = LOAD '/tmp/test.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map);