- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Compare 2 dataframes and filter results based on date column in spark
- Labels:
-
Apache Spark
Created 05-08-2017 05:49 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have 2 dataframes in spark as mentioned below.
val test = hivecontext.sql("select max(test_dt) as test_dt from abc");
test: org.apache.spark.sql.DataFrame = [test_dt: string]
val test1 = hivecontext.table("testing");
where test1 has columns like id,name,age,audit_dt
I want to compare these 2 dataframes and filter rows from test1 where audit_dt > test_dt. Somehow I am not able to do that. I am able to compare audit_dt with literal date using lit function but i am not able to compare it with another dataframe column.
I am able to compare literal date using lit function as mentioned below
val output = test1.filter(to_date(test1("audit_date")).gt(lit("2017-03-23")))
Can anyone suggest as way to compare it with column of dataframe test?
Thanks
Rahul
Created 05-10-2017 03:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Using HDP 2.5 with Spark 2 If you defind the code as follows:
val spark = SparkSession .builder .appName("my app") .getOrCreate()
import spark.implicits._
val test = spark.sqlContext.sql("select max (test_dt) as test_dt from abc").as[String]
val test1 = spark.sqlContext.table("testing")
The following two statements will compile
val output2 = test1.filter(test1("audit_date").gt(test).toString())
val output2 = test1.filter(test1("audit_date").gt(test))
of course you can always convert test to String and use the variable in the filter clause.
