Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Remove Leading zeros from column in Dataframe Join Expression in Spark-Scala

avatar
Contributor

Need to remove leading zeros in a join expression

DF1(TradeID) has values like "0000012345"

and DF2(TradeRefNo) has no leading zeros i.e. "12345"

 

val resultDf = Df1.join(Df2, Df1("TradeID") === Df2(TradeRefNo"))

What's the best way to remove the leading from the first dataframe's DF1("TradeID") column values so the compare works correctly

2 REPLIES 2

avatar
Super Guru
@ChineduLB

Have you tried to create another DF and cast the values to integer first before the JOIN?

Cheers
Eric

avatar
Contributor

I ended up creating a new column in new data frame via withColumn and used regex to populate the new column with the trimmed vals

 

thanks