<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Can you please advise about how best to use this SSD storage to boost performance in HDP on Azure? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-you-please-advise-about-how-best-to-use-this-SSD-storage/m-p/95378#M8751</link>
    <description>&lt;P&gt;You could use those disks for temporary data of your MapReduce/Tez processes (intermediate data during the shuffle&amp;amp;sort phase). That should boost quite a lot your performance.&lt;/P&gt;&lt;P&gt;See some benchmarks in that paper (look at figure 17, tmpSSD vs HD):&lt;/P&gt;&lt;P&gt;&lt;A target="_blank" href="https://peerj.com/preprints/1320.pdf"&gt;https://peerj.com/preprints/1320.pdf&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 14 Oct 2015 14:36:09 GMT</pubDate>
    <dc:creator>sluangsay</dc:creator>
    <dc:date>2015-10-14T14:36:09Z</dc:date>
    <item>
      <title>Can you please advise about how best to use this SSD storage to boost performance in HDP on Azure?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-you-please-advise-about-how-best-to-use-this-SSD-storage/m-p/95377#M8750</link>
      <description>&lt;P&gt;We're running HDP 2.3 on Windows Azure virtual machines. These machines come with a 400GB temporary SSD disk (which gets wiped after restart). I wanted to ask for advice about how best to use this SSD storage to boost performance? e.g. which config params should we change to point to locations on the SSD disk to boost HDFS / Tez / Yarn / Hive performance?&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 09:44:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-you-please-advise-about-how-best-to-use-this-SSD-storage/m-p/95377#M8750</guid>
      <dc:creator>cliu</dc:creator>
      <dc:date>2022-09-16T09:44:03Z</dc:date>
    </item>
    <item>
      <title>Re: Can you please advise about how best to use this SSD storage to boost performance in HDP on Azure?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-you-please-advise-about-how-best-to-use-this-SSD-storage/m-p/95378#M8751</link>
      <description>&lt;P&gt;You could use those disks for temporary data of your MapReduce/Tez processes (intermediate data during the shuffle&amp;amp;sort phase). That should boost quite a lot your performance.&lt;/P&gt;&lt;P&gt;See some benchmarks in that paper (look at figure 17, tmpSSD vs HD):&lt;/P&gt;&lt;P&gt;&lt;A target="_blank" href="https://peerj.com/preprints/1320.pdf"&gt;https://peerj.com/preprints/1320.pdf&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 14 Oct 2015 14:36:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-you-please-advise-about-how-best-to-use-this-SSD-storage/m-p/95378#M8751</guid>
      <dc:creator>sluangsay</dc:creator>
      <dc:date>2015-10-14T14:36:09Z</dc:date>
    </item>
    <item>
      <title>Re: Can you please advise about how best to use this SSD storage to boost performance in HDP on Azure?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-you-please-advise-about-how-best-to-use-this-SSD-storage/m-p/95379#M8752</link>
      <description>&lt;P&gt;SSDs are best suited for the shuffle intermediate data storage &amp;amp; on-disk logging.

For Tez/Yarn/Hive, the main parameter to modify is inside the yarn-site.xml (Tez uses sub-dirs)

&lt;EM&gt;yarn.nodemanager.local-dirs&lt;/EM&gt;=file://d:/yarn/data&lt;/P&gt;&lt;P&gt;&lt;EM&gt;yarn.nodemanager.log-dirs&lt;/EM&gt;=file://d:/yarn/logs&lt;/P&gt;&lt;P&gt;Also, check for SSD+TRIM by checking the fsutil behavior command.

You can also use the SSD acceleration for Temporary tables in hive, exposing the SSD via HDFS

The &lt;EM&gt;dfs.datanode.data.dir&lt;/EM&gt; needs a parameter like "[SSD]file://d:/hdfs/data" (to store the SSD data on d:\hdfs\data).

And hive-site.xml needs &lt;EM&gt;hive.exec.temporary.table.storage&lt;/EM&gt;=SSD;&lt;/P&gt;&lt;P&gt;Then you can use 

"create temporary table xyz stored as orc as select.... from table where ...;"

To create temporary tables cached on SSDs.
&lt;/P&gt;</description>
      <pubDate>Wed, 21 Oct 2015 11:40:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-you-please-advise-about-how-best-to-use-this-SSD-storage/m-p/95379#M8752</guid>
      <dc:creator>gopalv</dc:creator>
      <dc:date>2015-10-21T11:40:16Z</dc:date>
    </item>
    <item>
      <title>Re: Can you please advise about how best to use this SSD storage to boost performance in HDP on Azure?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-you-please-advise-about-how-best-to-use-this-SSD-storage/m-p/95380#M8753</link>
      <description>&lt;P&gt;@&lt;A href="http://community.hortonworks.com/users/383/cliu.html"&gt;cliu@hortonworks.com&lt;/A&gt;&lt;/P&gt;&lt;P&gt;This is very helpful benchmarks posted by Amplab. &lt;A target="_blank" href="http://www.sandisk.com/assets/docs/sandisk-solid-state-drives-ssds-for-big-data-analytics-using-hadoop-and-hive.pdf"&gt;Click&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 21 Oct 2015 16:47:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Can-you-please-advise-about-how-best-to-use-this-SSD-storage/m-p/95380#M8753</guid>
      <dc:creator>nsabharwal</dc:creator>
      <dc:date>2015-10-21T16:47:27Z</dc:date>
    </item>
  </channel>
</rss>

