<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Amount of data storage : HDFS vs NoSQL in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Amount-of-data-storage-HDFS-vs-NoSQL/m-p/103212#M15477</link>
    <description>&lt;A rel="user" href="https://community.cloudera.com/users/1227/gtmehdi.html" nodeid="1227"&gt;@Mehdi TAZI&lt;/A&gt;&lt;P&gt;Why NoSQL Solutions cassandra for example can't handle the same amount of data like HDFS ?&lt;/P&gt;&lt;P&gt;You can find good explanation &lt;A target="_blank" href="http://stackoverflow.com/questions/13350293/cassandra-and-hadoop-realtime-vs-batch"&gt;here&lt;/A&gt; &lt;/P&gt;</description>
    <pubDate>Tue, 19 Jan 2016 21:16:15 GMT</pubDate>
    <dc:creator>nsabharwal</dc:creator>
    <dc:date>2016-01-19T21:16:15Z</dc:date>
    <item>
      <title>Amount of data storage : HDFS vs NoSQL</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Amount-of-data-storage-HDFS-vs-NoSQL/m-p/103211#M15476</link>
      <description>&lt;P&gt;In several sources on internet, they explain that HDFS is build to handle more amount of data than nosql solutions(cassandra for ex). in general when we go further than 1To we must start thinking Hadoop(HDFS) and not NoSQL.&lt;/P&gt;&lt;P&gt;Beside the architecture and the fact that HDFS performs in batch and that most of noSQL (ex : cassandra) perform in random I/O, and beside the schema design differences, why NoSQL Solutions cassandra for example can't handle the same amount of data like HDFS ?&lt;/P&gt;&lt;P&gt;Why can't we use those solutions as datalake, why we only use them as hot storage solutions in a big data architecture.&lt;/P&gt;&lt;P&gt;thanks a lot&lt;/P&gt;</description>
      <pubDate>Tue, 19 Jan 2016 21:02:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Amount-of-data-storage-HDFS-vs-NoSQL/m-p/103211#M15476</guid>
      <dc:creator>TAZIMehdi</dc:creator>
      <dc:date>2016-01-19T21:02:45Z</dc:date>
    </item>
    <item>
      <title>Re: Amount of data storage : HDFS vs NoSQL</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Amount-of-data-storage-HDFS-vs-NoSQL/m-p/103212#M15477</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/1227/gtmehdi.html" nodeid="1227"&gt;@Mehdi TAZI&lt;/A&gt;&lt;P&gt;Why NoSQL Solutions cassandra for example can't handle the same amount of data like HDFS ?&lt;/P&gt;&lt;P&gt;You can find good explanation &lt;A target="_blank" href="http://stackoverflow.com/questions/13350293/cassandra-and-hadoop-realtime-vs-batch"&gt;here&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Tue, 19 Jan 2016 21:16:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Amount-of-data-storage-HDFS-vs-NoSQL/m-p/103212#M15477</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2016-01-19T21:16:15Z</dc:date>
    </item>
    <item>
      <title>Re: Amount of data storage : HDFS vs NoSQL</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Amount-of-data-storage-HDFS-vs-NoSQL/m-p/103213#M15478</link>
      <description>&lt;P&gt; &lt;A rel="user" href="https://community.cloudera.com/users/1227/gtmehdi.html" nodeid="1227"&gt;@Mehdi TAZI&lt;/A&gt; HDFS is not NoSQL.  NoSQL solutions place schemas (albeit flexible and loose schemas) on the data and are considered alternatives to traditional relational systems. HDFS is scalable, redundant storage and assumes no structure on the data. &lt;/P&gt;&lt;P&gt;Many NoSQL solutions in fact use HDFS for their storage. The point is that when you land your data (pdf, txt, json, xml...) in HDFS you have the flexibility to operate on that data with any tool you choose. In many cases the tools you can use to analyze data structured in a NoSQL solution is limited.  &lt;/P&gt;&lt;P&gt;If you want to dig further, I suggest reading up on the &lt;A href="https://en.wikipedia.org/wiki/CAP_theorem"&gt;CAP Theorem&lt;/A&gt;. All database systems must adhere to the CAP Theorem. Because HDFS is storage, it doesn't have this limitation. &lt;/P&gt;</description>
      <pubDate>Tue, 19 Jan 2016 21:16:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Amount-of-data-storage-HDFS-vs-NoSQL/m-p/103213#M15478</guid>
      <dc:creator>SQLShaw</dc:creator>
      <dc:date>2016-01-19T21:16:25Z</dc:date>
    </item>
    <item>
      <title>Re: Amount of data storage : HDFS vs NoSQL</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Amount-of-data-storage-HDFS-vs-NoSQL/m-p/103214#M15479</link>
      <description>&lt;P&gt;Hello !  thanks a lot for your answer, i did read CAP Theorem,but i still can't see why Cassandra can't handle the same amount of data as hadoop does.&lt;/P&gt;</description>
      <pubDate>Tue, 19 Jan 2016 21:39:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Amount-of-data-storage-HDFS-vs-NoSQL/m-p/103214#M15479</guid>
      <dc:creator>TAZIMehdi</dc:creator>
      <dc:date>2016-01-19T21:39:00Z</dc:date>
    </item>
    <item>
      <title>Re: Amount of data storage : HDFS vs NoSQL</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Amount-of-data-storage-HDFS-vs-NoSQL/m-p/103215#M15480</link>
      <description>&lt;P&gt;Thanks a lot, i had already seen this post, and there is still no answer why cassandra can't manage the same amount of data as hadoop does.&lt;/P&gt;&lt;P&gt;nb : the accepted answer is not completely true, cassandra doesn't run over HDFS.&lt;/P&gt;</description>
      <pubDate>Tue, 19 Jan 2016 21:41:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Amount-of-data-storage-HDFS-vs-NoSQL/m-p/103215#M15480</guid>
      <dc:creator>TAZIMehdi</dc:creator>
      <dc:date>2016-01-19T21:41:04Z</dc:date>
    </item>
    <item>
      <title>Re: Amount of data storage : HDFS vs NoSQL</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Amount-of-data-storage-HDFS-vs-NoSQL/m-p/103216#M15481</link>
      <description>&lt;P&gt;Cassandra uses a filesystem similar to HDFS so, yes, Cassandra, like HBase, can scale. The difference is Cassandra is a solution while Hadoop, HDFS in particular, is a platform. Use Cassandra for specific use cases and access patterns but use HDFS as your data lake.&lt;/P&gt;</description>
      <pubDate>Tue, 19 Jan 2016 22:09:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Amount-of-data-storage-HDFS-vs-NoSQL/m-p/103216#M15481</guid>
      <dc:creator>SQLShaw</dc:creator>
      <dc:date>2016-01-19T22:09:31Z</dc:date>
    </item>
    <item>
      <title>Re: Amount of data storage : HDFS vs NoSQL</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Amount-of-data-storage-HDFS-vs-NoSQL/m-p/103217#M15482</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1227/gtmehdi.html" nodeid="1227"&gt;@Mehdi TAZI&lt;/A&gt;  Agree on Cassandra file system. It's CFS&lt;/P&gt;&lt;P&gt;I won't compare Cassandra with HDFS. HDFS is storage layer and Cassandra is nosql database. &lt;/P&gt;</description>
      <pubDate>Tue, 19 Jan 2016 22:19:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Amount-of-data-storage-HDFS-vs-NoSQL/m-p/103217#M15482</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2016-01-19T22:19:55Z</dc:date>
    </item>
    <item>
      <title>Re: Amount of data storage : HDFS vs NoSQL</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Amount-of-data-storage-HDFS-vs-NoSQL/m-p/103218#M15483</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1227/gtmehdi.html" nodeid="1227"&gt;@Mehdi TAZI&lt;/A&gt; Hope it was helpful. &lt;/P&gt;</description>
      <pubDate>Wed, 20 Jan 2016 01:44:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Amount-of-data-storage-HDFS-vs-NoSQL/m-p/103218#M15483</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2016-01-20T01:44:27Z</dc:date>
    </item>
    <item>
      <title>Re: Amount of data storage : HDFS vs NoSQL</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Amount-of-data-storage-HDFS-vs-NoSQL/m-p/103219#M15484</link>
      <description>&lt;P&gt;That sounds wrong. The CAP theorem is an assertion about tradeoffs in all distributed systems and is equally applicable to HDFS. We do make tradeoffs within HDFS to prioritize consistency.&lt;/P&gt;</description>
      <pubDate>Thu, 21 Jan 2016 07:53:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Amount-of-data-storage-HDFS-vs-NoSQL/m-p/103219#M15484</guid>
      <dc:creator>ArpitAgarwal</dc:creator>
      <dc:date>2016-01-21T07:53:59Z</dc:date>
    </item>
    <item>
      <title>Re: Amount of data storage : HDFS vs NoSQL</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Amount-of-data-storage-HDFS-vs-NoSQL/m-p/103220#M15485</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1227/gtmehdi.html" nodeid="1227"&gt;@Mehdi TAZI&lt;/A&gt; &lt;/P&gt;&lt;P&gt;As &lt;A rel="user" href="https://community.cloudera.com/users/126/aagarwal.html" nodeid="126"&gt;@Arpit Agarwal&lt;/A&gt; mentioned this is not related to CAP theorem. HDFS and Cassandra exposes different kind of interfaces so an apple to apple comparison is not possible. From the papers and benchmarking results that I have seen Cassandra is often restricted to sub-1000 nodes. &lt;/P&gt;&lt;P&gt;References : Planet Cassandra &lt;A href="http://www.planetcassandra.org/nosql-performance-benchmarks/" target="_blank"&gt;http://www.planetcassandra.org/nosql-performance-benchmarks/&lt;/A&gt; &lt;/P&gt;&lt;P&gt;Netflix Engineering Blog :&lt;/P&gt;&lt;P&gt;&lt;A href="http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html" target="_blank"&gt;http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="http://techblog.netflix.com/2014/07/revisiting-1-million-writes-per-second.html" target="_blank"&gt;http://techblog.netflix.com/2014/07/revisiting-1-million-writes-per-second.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;It is typical to see HDFS clusters with sizes far more than 1000's of nodes, so the scale at which HDFS operates is very different from Cassandra. Please keep in mind that a 300 odd nodes dedicated to No-SQL storage can store large amounts of data. However where HDFS shines is the diverse set of applications that you can run on it. Cassandra addresses a very focused scenario, where as HDFS is very general purpose. You can run a set of application including HBase which provides functionality that Cassandra provides. &lt;/P&gt;&lt;P&gt;So if you are an enterprise it is often the case that you have needs that can only be addressed by different tools, and HDFS will provide access to set of tools that operate upon your data. &lt;/P&gt;&lt;P&gt;At this point of time, we have no data that says Cassandra can or cannot handle same amount of data as HDFS, I think the only data point is typically Cassandra benchmarks are run with with much smaller number of nodes.&lt;/P&gt;</description>
      <pubDate>Thu, 21 Jan 2016 08:20:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Amount-of-data-storage-HDFS-vs-NoSQL/m-p/103220#M15485</guid>
      <dc:creator>aengineer</dc:creator>
      <dc:date>2016-01-21T08:20:49Z</dc:date>
    </item>
  </channel>
</rss>

