I have created a .jar file that utilizes many functions from google.i18n.phonenumbers java package. The jar file correctly loads the necessary external libraries from:
com.googlecode.libphonenumber.libphonenumber,
com.googlecode.libphonenumber.geocoder,
com.googlecode.libphonenumber.carrier
The .jar file is used to create a HIVE UDF function with the following command (on Hue Editor-Cloudera 6.x):
create function func_name as 'util.ParserUtil';
The .jar file is located on HIVE_AUX_JARS_PATH=/usr/lib/hive/ for the command above to execute successfully.
However, the UDF function func_name is used in a SQL query file located on hdfs. Here is the problem:
- When the SQL query is executed through Oozie workflow (spawning an execute.sql file) the Oozie throws the below exception:
Error: java.lang.RuntimeException: java.lang.NoSuchMethodError: com.google.i18n.phonenumbers.PhoneNumberUtil.parse(Ljava/lang/String;Ljava/lang/String;)Lcom/google/i18n/phonenumbers/Phonenumber$PhoneNumber;
, which is awkward since the jar file has both the com.google.i18n.phonenumbers.PhoneNumberUtil.parse() and com/google/i18n/phonenumbers/Phonenumber$PhoneNumber classes (github)
- When the same SQL query is executed through Hive Editor on Hue or a beeline query it's successfully completed.
Thus, in my guess I must apply one of the following two solutions:
- (this is my current configuration) Can oozie take into consideration the .jar file when it's located explicitly at HIVE_AUX_JARS_PATH=/usr/lib/hive/?, when running the workflow.
- Or shall I upload the .jar file (that creates the UDF and includes the necessary java classes from google.i18n) on hdfs path /user/oozie/share/lib/hive/
- Or upload .jar file on a lib folder on hdfs like /user/src/myFile.jar and provide that path on oozie workflow .xml file like <file>hdfs://user/src/myFile.jar</file>? As a result, the oozie workflow will execute correctly the hive udf function from myFile.jar.
Does point (1) suffice or should I apply one of the points (2), (3)?