Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Sql Query to SparkDataframe to get Date_add interval from another table (Dataframe)

Sql Query to SparkDataframe to get Date_add interval from another table (Dataframe)

New Contributor

I need build dataframe query ..I cant understand how to get value as Date_Add interval I had a query where I need to get data between the dates MySql Query looks like

 select DAY(DATE_ADD(a.DSDate,INTERVAL b.N DAY)),MONTH(DATE_ADD(a.DSDate,INTERVAL b.N DAY)),YEAR(DATE_ADD(a.DSDate,INTERVAL b.N DAY)) 
 STR_TO_DATE(CONCAT(MONTH(DATE_ADD(a.DSDate,
                            INTERVAL b.N DAY)),
                    '/',
                    DAY(DATE_ADD(a.DSDate,
                            INTERVAL b.N DAY)),
                    '/',
                    YEAR(DATE_ADD(a.DSDate,
                            INTERVAL b.N DAY))),
            '%m/%d/%Y') AS NDate 
            FROM
    inch a,
    ttable b
WHERE
    a.id = 1
        AND b.N < (CASE
        WHEN DATEDIFF(a.DEDate, a.DSDate) = '0' THEN '1'
        ELSE DATEDIFF(a.DEDate, a.DSDate) + 1
    END);
my Problem is how to get b.n value as interval in dataframes can any get me solution to solve this?



4 REPLIES 4

Re: Sql Query to SparkDataframe to get Date_add interval from another table (Dataframe)

New Contributor

I am trying it in but can't able to execute INTERVAL and get values between these interval

val getDateDiff = udf { (col1: Int) =>
     |     var diff = 0;
     |     (if (col1 == 0) { diff = 1 } else { diff = col1 + 1 })
     |     diff
     |   }
 val df = DS.withColumn("tmpDateDiff", getDateDiff(datediff($"DEDate",$"DSDate")))
 val df = df.filter($"DSDate"+expr("INTERVAL tmpDateDiff") < $"DEDate")



Re: Sql Query to SparkDataframe to get Date_add interval from another table (Dataframe)

Expert Contributor

Can you elaborate a little more on what exactly you are trying to do? Explain your query and what may be sample input and output dataset as well. Thanks!

Re: Sql Query to SparkDataframe to get Date_add interval from another table (Dataframe)

New Contributor
Sample Data

 Id,startdate,enddate,datediff,did,usage
 1,2015-08-26,2015-09-27,32,326-10,127
 2,2015-09-27,2015-10-20,21,327-99,534
  ..
  ..
Result must be 

Id,startdate,enddate,datediff,day,month,did,usage,perdayusage
1,2015-08-26,2015-09-27,32,26,08,326-10,127,3.96
1,2015-08-26,2015-09-27,32,27,08,326-10,127,3.96
1,2015-08-26,2015-09-27,32,28,08,326-10,127,3.96
.
.
.
1,2015-08-26,2015-09-27,32,27,09,326-10,127,3.96


Re: Sql Query to SparkDataframe to get Date_add interval from another table (Dataframe)

Expert Contributor

Hi @Anji Palla

I tested the code below with the sample input dataset you have given.

 Id,startdate,enddate,datediff,did,usage
 1,2015-08-26,2015-09-27,32,326-10,127
 2,2015-09-27,2015-10-20,21,327-99,534

Make sure you change the path.

import org.apache.spark.{ SparkConf, SparkContext }
import org.apache.spark.sql.functions.broadcast
import org.apache.spark.sql.types._
import org.apache.spark.sql._
import org.apache.spark.sql.functions._

import scala.collection.mutable.ListBuffer
import java.util.{GregorianCalendar, Date}
import java.util.Calendar


def generateDates(startdate: Date, enddate: Date): ListBuffer[String] =    {
  var dateList = new ListBuffer[String]()
  var calendar = new GregorianCalendar()
  calendar.setTime(startdate)
    while (calendar.getTime().before(enddate)) {
        dateList += calendar.getTime().toString().substring(0, 10) + "," + (calendar.get(Calendar.DAY_OF_MONTH)) + "," + calendar.get(Calendar.MONTH)
        calendar.add(Calendar.DATE, 1)
    }
  dateList += calendar.getTime().toString()
  println("\n" + dateList + "\n")
  dateList
}

def getRddList(a :String) : ListBuffer[(String,String,String,String,String)] = {
  var allDates = new ListBuffer[(String,String,String,String,String)]()
  for (x <- generateDates(format.parse(a.split(",")(1)),format.parse(a.split(",")(2)))) {
              allDates += ((  a.split(",")(0).toString(), x , a.split(",")(3).toString(),
                               a.split(",")(4).toString(),  a.split(",")(5).toString()  ))
        }
          allDates
}


var  fileRdd = sc.textFile("/hadoopPath);
var myRdd = fileRdd.map{x=>getRddList(x)}.flatMap(y=>y))

Special thanks to @Arun A K for his help on a similar problem.

Don't have an account?
Coming from Hortonworks? Activate your account here