Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Remove Leading zeros from column in Dataframe Join Expression in Spark-Scala

Highlighted

Remove Leading zeros from column in Dataframe Join Expression in Spark-Scala

Explorer

Need to remove leading zeros in a join expression

DF1(TradeID) has values like "0000012345"

and DF2(TradeRefNo) has no leading zeros i.e. "12345"

 

val resultDf = Df1.join(Df2, Df1("TradeID") === Df2(TradeRefNo"))

What's the best way to remove the leading from the first dataframe's DF1("TradeID") column values so the compare works correctly

2 REPLIES 2

Re: Remove Leading zeros from column in Dataframe Join Expression in Spark-Scala

Guru
@ChineduLB

Have you tried to create another DF and cast the values to integer first before the JOIN?

Cheers
Eric

Re: Remove Leading zeros from column in Dataframe Join Expression in Spark-Scala

Explorer

I ended up creating a new column in new data frame via withColumn and used regex to populate the new column with the trimmed vals

 

thanks

 

Don't have an account?
Coming from Hortonworks? Activate your account here