Created 10-19-2016 09:22 PM
I am using Spark 1.6.2 and I have a data frame like this:
|Extension|
|gif |
|gif |
|gif |
|gif |
|html |
|gif |
|html |
|html |
| |
| |
|gif |
As you can see, there are some blank rows. They are not null because when I ran isNull() on the data frame, it showed false for all records. Then I thought of replacing those blank values to something like 'None' using regexp_replace. It does not affect the data frame column values. Running the following command right now:
%pyspark
from pyspark.sql.functions import *
extension_df3 = extension_df1.select(regexp_replace('Extension','\\s','None').alias('Extension'))
extension_df3.show(100,truncate=False)
I am replacing on the basis of white space which I guess is wrong. Can somebody please guide me how to do it?
Created 10-19-2016 10:52 PM
It worked, I changed regexp_replace to replace function. Used the following command:
%pyspark
from pyspark.sql.functions import *
extension_df4 = extension_df1.replace('','None','Extension').alias('Extension')
extension_df4.show(100,truncate=False)
It gives me the following output:
|Extension|
|gif |
|gif |
|gif |
|gif |
|html |
|gif |
|html |
|html |
|None |
|None |
|gif | |
Created 10-19-2016 10:30 PM
More information, when I am doing:
%pyspark
from pyspark.sql.functions import *
extension_df3 = extension_df1.select(regexp_replace('Extension','','None').alias('Extension'))
extension_df3.show(100,truncate=False)
It changes the data frame in the state which I do not want:
Extension |
|NonegNoneiNonefNone |
|NonegNoneiNonefNone |
|NonegNoneiNonefNone |
|NonegNoneiNonefNone |
|NonehNonetNonemNonelNone|
|NonegNoneiNonefNone |
|NonehNonetNonemNonelNone| |
NonehNonetNonemNonelNone|
|None |
|None |
|NonegNoneiNonefNone |
Created 10-19-2016 10:52 PM
It worked, I changed regexp_replace to replace function. Used the following command:
%pyspark
from pyspark.sql.functions import *
extension_df4 = extension_df1.replace('','None','Extension').alias('Extension')
extension_df4.show(100,truncate=False)
It gives me the following output:
|Extension|
|gif |
|gif |
|gif |
|gif |
|html |
|gif |
|html |
|html |
|None |
|None |
|gif | |
Created 06-11-2019 01:17 PM
@Mushtaq Rizvi I hope what ever you're doing above is just replacing with "None" which is a string which consumes memory.
Let's I've a scenario. I wanted to replace the blank spaces like below with null values. Can you suggest something on how to do this. Because the whitespaces consume memory where as null values doesn't.
|Extension|
|gif |
| |
|gif |
| |
|html |
I wanted it like this.
|Extension|
|gif |
|null|
|gif |
|null|
|html |