Reply
New Contributor
Posts: 1
Registered: ‎10-27-2015

How to use Analytical/Window functions (last_value) in Spark Java?

I'm trying to use analytical/window function last_value in Spark Java.

 

Netezza Query:

select sno, name, addr1, addr2, run_dt, last_value(addr1 ignore nulls) over (partition by sno, name, addr1, addr2, run_dt order by beg_ts , end_ts rows between unbounded preceding and unbounded following ) as last_addr1 from daily

 

We want to implement this query n Spark Java (Without using HiveSQLContext):

SparkConf conf = new SparkConf().setMaster(“local”).setAppName(“Agg”);

JavaSparkContext sc = new JavaSparkContext(conf);

SQLContext sqlContext = new SQLContext(sc);

JavaRDD<Stgdailydtl> daily = sc.textFile(“C:\\Testing.txt”).map( new Function<String, Stgdailydtl>() { private static final long serialVersionUID = 1L; public Stgdailydtl call(String line) throws Exception { String[] parts = line.split(“,”);

Stgdailydtl daily = new Stgdailydtl();

daily.setSno(Integer.parseInt(parts[0].trim()));

…..

return daily; } });

DataFrame schemaDailydtl = sqlContext.createDataFrame(daily, Stgdailydtl.class); schemaDailydtl.registerTempTable(“daily”);

WindowSpec ws = Window.partitionBy(“sno, name, addr1, addr2, run_dt”).orderBy(“beg_ts , end_ts”).rowsBetween(0, 100000);

DataFrame df = sqlContext.sql(“select sno, name, addr1, addr2, run_dt ” + “row_number() over(partition by mach_id, msrmt_gbl_id, msrmt_dsc, elmt_dsc, end_cptr_dt order by beg_cptr_ts, end_cptr_ts) from daily “)

; } }

Error:

Exception in thread “main” java.lang.RuntimeException: [1.110] failure: union” expected but `(‘ found select stg.mach_id, stg.msrmt_gbl_id, stg.msrmt_dsc, stg.elmt_dsc, stg.elmt_dsc_grp_concat, row_number() over(partition by mach_id, msrmt_gbl_id, msrmt_dsc, elmt_dsc, end_cptr_dt order by beg_cptr_ts, end_cptr_ts) from stgdailydtl stg ^ at scala.sys.package$.error(package.scala:27)

 

 Please suggest how to implement the last_value in Spark Java. Thanks for your help​