<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to replace blank rows in pyspark Dataframe? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-replace-blank-rows-in-pyspark-Dataframe/m-p/140009#M44017</link>
    <description>&lt;P&gt;More information, when I am doing:&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;%pyspark&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;from pyspark.sql.functions import * &lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;extension_df3 = extension_df1.select(regexp_replace('Extension','','None').alias('Extension')) &lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;extension_df3.show(100,truncate=False)&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;It changes the data frame in the state which I do not want:&lt;/P&gt;&lt;P&gt;Extension               | &lt;/P&gt;&lt;P&gt;|NonegNoneiNonefNone     | &lt;/P&gt;&lt;P&gt;|NonegNoneiNonefNone     | &lt;/P&gt;&lt;P&gt;|NonegNoneiNonefNone     | &lt;/P&gt;&lt;P&gt;|NonegNoneiNonefNone     | &lt;/P&gt;&lt;P&gt;|NonehNonetNonemNonelNone| &lt;/P&gt;&lt;P&gt;|NonegNoneiNonefNone     | &lt;/P&gt;&lt;P&gt;|NonehNonetNonemNonelNone|
|&lt;/P&gt;&lt;P&gt;NonehNonetNonemNonelNone| &lt;/P&gt;&lt;P&gt;|None                    | &lt;/P&gt;&lt;P&gt;|None                    | &lt;/P&gt;&lt;P&gt;|NonegNoneiNonefNone     |&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;
&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 20 Oct 2016 05:30:08 GMT</pubDate>
    <dc:creator>mrizvi</dc:creator>
    <dc:date>2016-10-20T05:30:08Z</dc:date>
    <item>
      <title>How to replace blank rows in pyspark Dataframe?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-replace-blank-rows-in-pyspark-Dataframe/m-p/140008#M44016</link>
      <description>&lt;P&gt;I am using Spark 1.6.2 and I have a data frame like this:&lt;/P&gt;&lt;P&gt;|Extension|&lt;/P&gt;&lt;P&gt;|gif | &lt;/P&gt;&lt;P&gt;|gif | &lt;/P&gt;&lt;P&gt;|gif | &lt;/P&gt;&lt;P&gt;|gif | &lt;/P&gt;&lt;P&gt;|html | &lt;/P&gt;&lt;P&gt;|gif | &lt;/P&gt;&lt;P&gt;|html |&lt;/P&gt;&lt;P&gt;|html | &lt;/P&gt;&lt;P&gt;| | &lt;/P&gt;&lt;P&gt;| | &lt;/P&gt;&lt;P&gt;|gif | &lt;/P&gt;&lt;P&gt;As you can see, there are some blank rows. They are not null because when I ran isNull() on the data frame, it showed false for all records. Then I thought of replacing those blank values to something like 'None' using regexp_replace. It does not affect the data frame column values. Running the following command right now:&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;%pyspark &lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;from pyspark.sql.functions import * &lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;extension_df3 = extension_df1.select(regexp_replace('Extension','\\s','None').alias('Extension')) &lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;extension_df3.show(100,truncate=False)&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;I am replacing on the basis of white space which I guess is wrong. Can somebody please guide me how to do it?&lt;/P&gt;</description>
      <pubDate>Thu, 20 Oct 2016 04:22:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-replace-blank-rows-in-pyspark-Dataframe/m-p/140008#M44016</guid>
      <dc:creator>mrizvi</dc:creator>
      <dc:date>2016-10-20T04:22:38Z</dc:date>
    </item>
    <item>
      <title>Re: How to replace blank rows in pyspark Dataframe?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-replace-blank-rows-in-pyspark-Dataframe/m-p/140009#M44017</link>
      <description>&lt;P&gt;More information, when I am doing:&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;%pyspark&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;from pyspark.sql.functions import * &lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;extension_df3 = extension_df1.select(regexp_replace('Extension','','None').alias('Extension')) &lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;extension_df3.show(100,truncate=False)&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;It changes the data frame in the state which I do not want:&lt;/P&gt;&lt;P&gt;Extension               | &lt;/P&gt;&lt;P&gt;|NonegNoneiNonefNone     | &lt;/P&gt;&lt;P&gt;|NonegNoneiNonefNone     | &lt;/P&gt;&lt;P&gt;|NonegNoneiNonefNone     | &lt;/P&gt;&lt;P&gt;|NonegNoneiNonefNone     | &lt;/P&gt;&lt;P&gt;|NonehNonetNonemNonelNone| &lt;/P&gt;&lt;P&gt;|NonegNoneiNonefNone     | &lt;/P&gt;&lt;P&gt;|NonehNonetNonemNonelNone|
|&lt;/P&gt;&lt;P&gt;NonehNonetNonemNonelNone| &lt;/P&gt;&lt;P&gt;|None                    | &lt;/P&gt;&lt;P&gt;|None                    | &lt;/P&gt;&lt;P&gt;|NonegNoneiNonefNone     |&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;
&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 20 Oct 2016 05:30:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-replace-blank-rows-in-pyspark-Dataframe/m-p/140009#M44017</guid>
      <dc:creator>mrizvi</dc:creator>
      <dc:date>2016-10-20T05:30:08Z</dc:date>
    </item>
    <item>
      <title>Re: How to replace blank rows in pyspark Dataframe?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-replace-blank-rows-in-pyspark-Dataframe/m-p/140010#M44018</link>
      <description>&lt;P&gt;It worked, I changed regexp_replace to replace function. Used the following command:&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;%pyspark&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;from pyspark.sql.functions import * &lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;extension_df4 = extension_df1.replace('','None','Extension').alias('Extension') &lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;extension_df4.show(100,truncate=False)&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;It gives me the following output:&lt;/P&gt;&lt;P&gt;|Extension| &lt;/P&gt;&lt;P&gt;|gif      | &lt;/P&gt;&lt;P&gt;|gif      | &lt;/P&gt;&lt;P&gt;|gif      | &lt;/P&gt;&lt;P&gt;|gif      | &lt;/P&gt;&lt;P&gt;|html     | &lt;/P&gt;&lt;P&gt;|gif      | &lt;/P&gt;&lt;P&gt;|html     | &lt;/P&gt;&lt;P&gt;|html     | &lt;/P&gt;&lt;P&gt;|None     | &lt;/P&gt;&lt;P&gt;|None     | &lt;/P&gt;&lt;P&gt;|gif      |
|&lt;/P&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;
&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 20 Oct 2016 05:52:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-replace-blank-rows-in-pyspark-Dataframe/m-p/140010#M44018</guid>
      <dc:creator>mrizvi</dc:creator>
      <dc:date>2016-10-20T05:52:03Z</dc:date>
    </item>
    <item>
      <title>Re: How to replace blank rows in pyspark Dataframe?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-replace-blank-rows-in-pyspark-Dataframe/m-p/140011#M44019</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/10331/mrizvi.html" nodeid="10331"&gt;@Mushtaq Rizvi&lt;/A&gt; I hope what ever you're doing above is just replacing with "None" which is a string which consumes memory.&lt;/P&gt;&lt;P&gt;Let's I've a scenario. I wanted to replace the blank spaces like below with null values. Can you suggest something on how to do this. Because the whitespaces consume memory where as null values doesn't.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;|Extension|&lt;/P&gt;&lt;P&gt;|gif |&lt;/P&gt;&lt;P&gt;| |&lt;/P&gt;&lt;P&gt;|gif |&lt;/P&gt;&lt;P&gt;| |&lt;/P&gt;&lt;P&gt;|html |&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;I wanted it like this.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;|Extension|&lt;/P&gt;&lt;P&gt;|gif |&lt;/P&gt;&lt;P&gt;|null|&lt;/P&gt;&lt;P&gt;|gif |&lt;/P&gt;&lt;P&gt;|null|&lt;/P&gt;&lt;P&gt;|html |&lt;/P&gt;</description>
      <pubDate>Tue, 11 Jun 2019 20:17:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-replace-blank-rows-in-pyspark-Dataframe/m-p/140011#M44019</guid>
      <dc:creator>shashankkumar_m</dc:creator>
      <dc:date>2019-06-11T20:17:09Z</dc:date>
    </item>
  </channel>
</rss>

