Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

Compare 2 dataframes and filter results based on date column in spark

Rising Star

I have 2 dataframes in spark as mentioned below.

val test = hivecontext.sql("select max(test_dt) as test_dt from abc");

test: org.apache.spark.sql.DataFrame = [test_dt: string]

val test1 = hivecontext.table("testing");

where test1 has columns like id,name,age,audit_dt

I want to compare these 2 dataframes and filter rows from test1 where audit_dt > test_dt. Somehow I am not able to do that. I am able to compare audit_dt with literal date using lit function but i am not able to compare it with another dataframe column.

I am able to compare literal date using lit function as mentioned below

val output = test1.filter(to_date(test1("audit_date")).gt(lit("2017-03-23")))

Can anyone suggest as way to compare it with column of dataframe test?





Using HDP 2.5 with Spark 2 If you defind the code as follows:

val spark = SparkSession .builder .appName("my app") .getOrCreate()

import spark.implicits._

val test = spark.sqlContext.sql("select max (test_dt) as test_dt from abc").as[String]

val test1 = spark.sqlContext.table("testing")

The following two statements will compile

val output2 = test1.filter(test1("audit_date").gt(test).toString())

val output2 = test1.filter(test1("audit_date").gt(test))

of course you can always convert test to String and use the variable in the filter clause.