About anjibabupalla

anjibabupalla · ‎02-22-2017

I tried this with udf and want to take the values to stringbuilder and then on next step I want to explode the values but can able to register the udf but unable get results val myUdf = udf { (col1: Timestamp, col2: Timestamp, col3: Int, sqlContext: SQLContext) => import sqlContext.implicits._ val sb = StringBuilder.newBuilder if (col3 == 0) { val dd = dayofmonth(date_add($"col1", 1)) val mm = month(date_add($"col1", 1)) val yy = year(date_add($"col1", 1)) val normalizedDate = concat(dd, mm, yy) sb.append(dd).append(",").append(mm).append(",").append(yy).append(",").append(normalizedDate) } else { for (i <- 2 until col3) { val dd = dayofmonth(date_add($"col1", i)) val mm = month(date_add($"col1", i)) val yy = year(date_add($"col1", i)) val normalizedDate = concat(dd, mm, yy) sb.append(dd).append(",").append(mm).append(",").append(yy).append(",").append(normalizedDate) } } sb.toString } java.lang.ClassCastException: $anonfun$1 cannot be cast to scala.Function3 at org.apache.spark.sql.catalyst.expressions.ScalaUDF.<init>(ScalaUDF.scala:106) at org.apache.spark.sql.expressions.UserDefinedFunction.apply(UserDefinedFunction.scala:56) ... 52 elided

anjibabupalla · ‎02-22-2017

I had dataframe data looks like Id,startdate,enddate,datediff,did,usage 1,2015-08-26,2015-09-27,32,326-10,127 2,2015-09-27,2015-10-20,21,327-99,534 .. .. So my requirement is if datediff is 32 I need to get perday usage For the first id 32 is the datediff so per day it will be 127/32. When I collect the result I should get Id,startdate,enddate,datediff,day,month,did,usage,perdayusage 1,2015-08-26,2015-09-27,32,26,08,326-10,127,3.96 1,2015-08-26,2015-09-27,32,27,08,326-10,127,3.96 1,2015-08-26,2015-09-27,32,28,08,326-10,127,3.96 . . . 1,2015-08-26,2015-09-27,32,27,09,326-10,127,3.96 I had tried I Was struck at initial line since above line transforms me one single but I can't understand how to get multiple rows based single row using datediff Val df2 = df1.select("Id","startdate",enddate","datediff","did","usage").withColumn("Day",dayofmonth($"startdate")).withColumn("Month",month($"startdate")).withColumn("perdayusaga',getperdayusageudf($"usage",$"datediff)) How could I get these results again as a dataframe

Online	Offline
Last Visited	‎06-23-2017 06:16 AM

Member Since	‎02-20-2017 01:27 PM
Last Visited	‎06-23-2017 06:16 AM
Posts	8
Kudos received	1

Cloudera Community

Re: Spark generate multiple rows based on column v...

Spark generate multiple rows based on column value