<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question How to process large volume of data(e.g, 100 GB) in Apache Hadoop? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-process-large-volume-of-data-e-g-100-GB-in-Apache/m-p/121068#M34278</link>
    <description>&lt;P&gt;I want to process 100 GB of RFID data on apache haddop. Can anyone explain me how to do it using Horton Works Sandbox.Thanks in advance&lt;/P&gt;</description>
    <pubDate>Tue, 21 Apr 2026 13:30:01 GMT</pubDate>
    <dc:creator>santoshdash_uu</dc:creator>
    <dc:date>2026-04-21T13:30:01Z</dc:date>
    <item>
      <title>How to process large volume of data(e.g, 100 GB) in Apache Hadoop?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-process-large-volume-of-data-e-g-100-GB-in-Apache/m-p/121068#M34278</link>
      <description>&lt;P&gt;I want to process 100 GB of RFID data on apache haddop. Can anyone explain me how to do it using Horton Works Sandbox.Thanks in advance&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2026 13:30:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-process-large-volume-of-data-e-g-100-GB-in-Apache/m-p/121068#M34278</guid>
      <dc:creator>santoshdash_uu</dc:creator>
      <dc:date>2026-04-21T13:30:01Z</dc:date>
    </item>
    <item>
      <title>Re: How to process large volume of data(e.g, 100 GB) in Apache Hadoop?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-process-large-volume-of-data-e-g-100-GB-in-Apache/m-p/121069#M34279</link>
      <description>&lt;P style="margin-left: 40px;"&gt; &lt;A rel="user" href="https://community.cloudera.com/users/11743/santoshdashuu.html" nodeid="11743"&gt;@SANTOSH DASH&lt;/A&gt; You can process data in hadoop using many difference services.  If your data has a schema then you can start with processing the data with hive.  Full tutorial &lt;A href="http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-hive/"&gt;here&lt;/A&gt;.  My preference is to do ELT logic with pig.  Full tutorial &lt;A href="http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-pig/"&gt;here&lt;/A&gt;.  there are many ways to skin a cat here.  Full list of tutorials are &lt;A href="http://hortonworks.com/tutorials/"&gt;here&lt;/A&gt;.  &lt;/P&gt;</description>
      <pubDate>Mon, 11 Jul 2016 10:50:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-process-large-volume-of-data-e-g-100-GB-in-Apache/m-p/121069#M34279</guid>
      <dc:creator>sunile_manjee</dc:creator>
      <dc:date>2016-07-11T10:50:01Z</dc:date>
    </item>
    <item>
      <title>Re: How to process large volume of data(e.g, 100 GB) in Apache Hadoop?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-process-large-volume-of-data-e-g-100-GB-in-Apache/m-p/121070#M34280</link>
      <description>&lt;P&gt;Regarding how refer to Sunile. Pig is nice and flexible, Hive is good if you know SQL and your RFID data is already basically in a flat table format, Spark also works well ... &lt;/P&gt;&lt;P&gt;But the question is if you really want to process 100GB of data on the sandbox. The memory settings are tiny there is a single drive data is not replicated ... If you do it like this you can just use python on a local machine. If you want a decent environment you might want to set up 3-4 nodes on a VMware server perhaps 32GB of RAM for each? That would give you a nice little environment and you could actually do some fast processing. &lt;/P&gt;</description>
      <pubDate>Mon, 11 Jul 2016 17:31:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-process-large-volume-of-data-e-g-100-GB-in-Apache/m-p/121070#M34280</guid>
      <dc:creator>bleonhardi</dc:creator>
      <dc:date>2016-07-11T17:31:35Z</dc:date>
    </item>
  </channel>
</rss>

