Sharing the steps to make Hive UDF/UDAF/UDTF to work natively with SparkSQL
1- Open spark-shell with hive udf jar as parameter:
spark-shell --jars path-to-your-hive-udf.jar
2- From spark-shell, open declare hive context and create functions
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc);
sqlContext.sql("""create temporary function balance as 'com.github.gbraccialli.hive.udf.BalanceFromRechargesAndOrders'""");
3- From spark-shell, use your UDFs directly in SparkSQL:
create table recharges_with_balance_array as
balance(orders,'date_order', 'order_value', reseller_id, date_recharge, phone_credit_value) as balance
PS: I found some issues using UDTFs with spark 1.3, which was fixed on spark 1.4.1. I tested all, UDF, UDAF and UDTF, all of them worked properly, same sql statements and same results.
@Guilherme Braccialli thanks for trying this. I am just starting with my Spark journey, but it seems that any time I try to do in zeppelin or jyputer i keep hitting different issues, I guess I should just stick with CLI for now. I will give demo's in zeppelin @ customer sites but will know the limitations of the product for now.
@Guilherme Braccialli Just tried your posted steps, everything worked great. Had problems doing mvn build using ur repo, but thats not an issue for me. I have the template idea, how to interact with hive and spark. Thanks for the post, very useful!
well , i tried create temporary function in beeline , and failed ; and create function, the function was created , and can be desc function. but , when i accessing it , it show can not find the function . so , do you have the same problem. I'm trying to let everyone connected to my thriftserver to have access to udf that I deployed. do you have any suggestions?
Hi @Guilherme Braccialli, so you did not run into this issue?