<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Spark SQL in-memory space managment in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-SQL-in-memory-space-managment/m-p/136984#M52037</link>
    <description>&lt;P&gt;Thanks @&lt;A href="https://community.hortonworks.com/users/3486/cstanca.html"&gt;Constantin Stanca&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Sat, 22 Apr 2017 01:12:35 GMT</pubDate>
    <dc:creator>mothi86</dc:creator>
    <dc:date>2017-04-22T01:12:35Z</dc:date>
    <item>
      <title>Spark SQL in-memory space managment</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-SQL-in-memory-space-managment/m-p/136982#M52035</link>
      <description>&lt;P&gt;Hi. Considering a spark sql or data set with 400 columns and 1 million rows. Not all rows have all 400 columns populated and essentially they cant be as not null columns as well. Need to understand if null value consumes space in memory and if so how much does it take. Do we have any fact sheet or article of all data types size in bytes or bits.&lt;/P&gt;</description>
      <pubDate>Fri, 20 Jan 2017 00:38:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-SQL-in-memory-space-managment/m-p/136982#M52035</guid>
      <dc:creator>mothi86</dc:creator>
      <dc:date>2017-01-20T00:38:24Z</dc:date>
    </item>
    <item>
      <title>Re: Spark SQL in-memory space managment</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-SQL-in-memory-space-managment/m-p/136983#M52036</link>
      <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/13790/mothi86.html"&gt;Mothilal marimuthu&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You did not specify whether you are talking about RDD, Datasets or Dataframe.&lt;/P&gt;&lt;P&gt;Anyhow, let't assume RDD. It is not like a columnar database where you account only for the key-value. This is a row-based format. There is cost associated with empty values. I cannot tell you the exact cost because it depends on your data types, but there is cost to it.&lt;/P&gt;&lt;P&gt;Why don't you run yourself a test. Persist your test RDD (small) with all values completed, then one with partial values, some of them null. Again, the data type matters. You can experiment by using null values on columns of the same type, then another RDD for a different type, etc.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;rdd.persist(StorageLevel.MEMORY_AND_DISK)&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 22 Feb 2017 06:21:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-SQL-in-memory-space-managment/m-p/136983#M52036</guid>
      <dc:creator>cstanca</dc:creator>
      <dc:date>2017-02-22T06:21:20Z</dc:date>
    </item>
    <item>
      <title>Re: Spark SQL in-memory space managment</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-SQL-in-memory-space-managment/m-p/136984#M52037</link>
      <description>&lt;P&gt;Thanks @&lt;A href="https://community.hortonworks.com/users/3486/cstanca.html"&gt;Constantin Stanca&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 22 Apr 2017 01:12:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-SQL-in-memory-space-managment/m-p/136984#M52037</guid>
      <dc:creator>mothi86</dc:creator>
      <dc:date>2017-04-22T01:12:35Z</dc:date>
    </item>
  </channel>
</rss>

