Options
- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Remove Leading zeros from column in Dataframe Join Expression in Spark-Scala
Labels:
- Labels:
-
Apache Spark
Rising Star
Created ‎11-08-2019 08:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Need to remove leading zeros in a join expression
DF1(TradeID) has values like "0000012345"
and DF2(TradeRefNo) has no leading zeros i.e. "12345"
val resultDf = Df1.join(Df2, Df1("TradeID") === Df2(TradeRefNo"))
What's the best way to remove the leading from the first dataframe's DF1("TradeID") column values so the compare works correctly
2 REPLIES 2
Super Guru
Created ‎11-20-2019 06:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@ChineduLB
Have you tried to create another DF and cast the values to integer first before the JOIN?
Cheers
Eric
Have you tried to create another DF and cast the values to integer first before the JOIN?
Cheers
Eric
Rising Star
Created ‎11-28-2019 07:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I ended up creating a new column in new data frame via withColumn and used regex to populate the new column with the trimmed vals
thanks
