Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Replacing values in the dataframe keeping original values

Highlighted

Replacing values in the dataframe keeping original values

New Contributor

Hi guys,

I have a dataframe as show below.

+------+----+
| col1|col2|col3
+------+----+
|abc| c|h
|abc| c|h
|abc| c|h
+------+----+

for the col2 I have a value c as shown in the above picture.

Now for all the c values should be replaced with cat and h which is the value of col3 should be replaced with hat.

I can achieve this easily with the below statement

dataframe.withColumn("col2", when(col("col2").equalTo("c"), "cat").otherwise(col("col2") )

but I want to have the original values also . Like say my final dataframe should be looked as shown below

+------+----+----+-----+----+
| col1|col2|col3|col4|col5 |
+------+----+----+-----+----+
|abc| c|cat|h|hat |
|abc| c|cat|h|hat |
|abc| c|cat|h|hat |
+------+----+---+-----------+


How can I achieve this ???

Any help....:)

1 REPLY 1

Re: Replacing values in the dataframe keeping original values

New Contributor

Hi,

you can achieve the output you are looking for by changing the output column name like so:

dataframe.withColumn("col4", when(col("col2").equalTo("c"), "cat").otherwise(col("col2"))

dataframe.withColumn("col5", when(col("col3").equalTo("h"), "hat").otherwise(col("col3"))

If you would like to revise the order of the columns, as by default the new columns will just be appended to the end, after the additional columns are added you can do a .select() afterwards with the order you would like.