Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (2)
avatar

Sharing the steps to make Hive UDF/UDAF/UDTF to work natively with SparkSQL

1- Open spark-shell with hive udf jar as parameter:

spark-shell --jars path-to-your-hive-udf.jar

2- From spark-shell, open declare hive context and create functions

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc);

sqlContext.sql("""create temporary function balance as 'com.github.gbraccialli.hive.udf.BalanceFromRechargesAndOrders'""");


3- From spark-shell, use your UDFs directly in SparkSQL:

sqlContext.sql("""
create table recharges_with_balance_array as 
select 
  reseller_id,
  phone_number,
  phone_credit_id,
  date_recharge,
  phone_credit_value,
  balance(orders,'date_order', 'order_value', reseller_id, date_recharge, phone_credit_value) as balance
from orders
""");

PS: I found some issues using UDTFs with spark 1.3, which was fixed on spark 1.4.1. I tested all, UDF, UDAF and UDTF, all of them worked properly, same sql statements and same results.

16,843 Views
Comments
avatar

@Guilherme Braccialli thanks for trying this. I am just starting with my Spark journey, but it seems that any time I try to do in zeppelin or jyputer i keep hitting different issues, I guess I should just stick with CLI for now. I will give demo's in zeppelin @ customer sites but will know the limitations of the product for now.

avatar

@Guilherme Braccialli Just tried your posted steps, everything worked great. Had problems doing mvn build using ur repo, but thats not an issue for me. I have the template idea, how to interact with hive and spark. Thanks for the post, very useful!

avatar
New Contributor

well , i tried create temporary function in beeline , and failed ; and create function, the function was created , and can be desc function. but , when i accessing it , it show can not find the function . so , do you have the same problem. I'm trying to let everyone connected to my thriftserver to have access to udf that I deployed. do you have any suggestions?

avatar
Contributor

Hi @Guilherme Braccialli, so you did not run into this issue?

https://issues.apache.org/jira/browse/SPARK-20033

Thank you.