<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Few Questions on Hadoop 2.x architecture in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Few-Questions-on-Hadoop-2-x-architecture/m-p/108001#M42465</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/12833/gobisubramani.html" nodeid="12833"&gt;@Gobi Subramani&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Please read the link so you can understand easily:-&lt;/P&gt;&lt;P&gt;&lt;A href="https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/YARN.html" target="_blank"&gt;https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/YARN.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html" target="_blank"&gt;https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Every hadoop command internally calls a java utility for the further operations. &lt;CODE&gt;org.apache.hadoop.fs.FsShell&lt;/CODE&gt; provide command line access to a FileSystem. &lt;CODE&gt;hadoop fs -put&lt;/CODE&gt; internally calls the corresponding method from the above package.&lt;/P&gt;&lt;P&gt;to undersatnd the fsshell code please go thorugh the link:-&lt;/P&gt;&lt;P&gt;&lt;A href="http://grepcode.com/file/repo1.maven.org/maven2/com.ning/metrics.collector/1.1.0/org/apache/hadoop/fs/FsShell.java" target="_blank"&gt;http://grepcode.com/file/repo1.maven.org/maven2/com.ning/metrics.collector/1.1.0/org/apache/hadoop/fs/FsShell.java&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Sat, 01 Oct 2016 17:42:30 GMT</pubDate>
    <dc:creator>ashneesharma88</dc:creator>
    <dc:date>2016-10-01T17:42:30Z</dc:date>
    <item>
      <title>Few Questions on Hadoop 2.x architecture</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Few-Questions-on-Hadoop-2-x-architecture/m-p/107999#M42463</link>
      <description>&lt;P&gt;I was going through the 2.x architecture, I got few question about the name node and resource manager&lt;/P&gt;&lt;P&gt; To resolve Single point of failure of Namenode in 1.x arch,In Hadoop 2.x have standby namenode.&lt;/P&gt;&lt;P&gt;  to reduce the load of the Job Tracker , In 2.x  we have Resource Manager.&lt;/P&gt;&lt;P&gt; Wanted to know, &lt;/P&gt;&lt;P&gt;  1. What is the role of Namenode  and Resource Manager ?&lt;/P&gt;&lt;P&gt;   2. As only one resource manager available/cluster , then It could be a Single Point of failure&lt;/P&gt;&lt;P&gt;   3. If NameNode storing meta data info about the blocks (as Hadoop 1.x ) , Then which service is responsible for getting data block information after submitting the job. &lt;/P&gt;&lt;P&gt;      see the image :  &lt;A href="http://hortonworks.com/wp-content/uploads/2014/04/YARN_distributed_arch.png" target="_blank"&gt;http://hortonworks.com/wp-content/uploads/2014/04/YARN_distributed_arch.png&lt;/A&gt;  &lt;/P&gt;&lt;P&gt;   Resource manager directly interacting with Nodes &lt;/P&gt;&lt;P&gt;4. can anyone tell me, how the flow goes for below commands&lt;/P&gt;&lt;P&gt;       $ hadoop fs -put  &amp;lt;source&amp;gt; &amp;lt;dest&amp;gt;&lt;/P&gt;&lt;P&gt;       $ hadoop jar app.jar  &amp;lt;app&amp;gt;  &amp;lt;inputfilepath&amp;gt; &amp;lt;outputpath&amp;gt;&lt;/P&gt;</description>
      <pubDate>Sat, 01 Oct 2016 13:32:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Few-Questions-on-Hadoop-2-x-architecture/m-p/107999#M42463</guid>
      <dc:creator>gobi_subramani</dc:creator>
      <dc:date>2016-10-01T13:32:49Z</dc:date>
    </item>
    <item>
      <title>Re: Few Questions on Hadoop 2.x architecture</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Few-Questions-on-Hadoop-2-x-architecture/m-p/108000#M42464</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/12833/gobisubramani.html" nodeid="12833" target="_blank"&gt;@Gobi Subramani&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;Below answers to your questions:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;The role of NameNode is to manage the HDFS file system. The role of Resource Manager is to manage cluster's ressources (CPU, RAM, etc) by collaborating with Node Managers. I won't write too much on these aspects as lot of documentation is already available. You can read on &lt;A href="http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html" rel="nofollow noopener noreferrer" target="_blank"&gt;HDFS&lt;/A&gt; and &lt;A href="http://hadoop.apache.org/docs/r2.7.3/hadoop-yarn/hadoop-yarn-site/YARN.html" rel="nofollow noopener noreferrer" target="_blank"&gt;Yarn&lt;/A&gt; architecture in the official documentation. &lt;/LI&gt;&lt;LI&gt;You can have High availability in Yarn by having active and standby Ressource Managers (more information &lt;A href="http://hadoop.apache.org/docs/r2.7.3/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html" rel="nofollow noopener noreferrer" target="_blank"&gt;here&lt;/A&gt;)&lt;/LI&gt;&lt;LI&gt;If you have a distributed application (Spark, Tez, etc) that needs data from HDFS, it will use Yarn and HDFS. Yarn will enable the application to request containers (which contains the required resources : CPU, RAM, etc) on different nodes. The application will be deployed and running inside these containers. Then, the application will be responsible for getting  data from HDFS by exchanging with NameNode and DataNodes. &lt;/LI&gt;&lt;LI&gt;For the put command, only HDFS is involved. Without going into details : the client asks the NameNode to create a new file in the NameSpace. The NameNode do some checks (file doesn't exists, user have right to write in the directory, etc) and allow the client to write data. At this time, the new file have no data blocks. Then the client starts writing data in blocks. For writing blocks, the HDFS API exchanges with the NameNode to get a list of DataNode on which it can write each block. The number of dataNodes depends on the replication factor and the list is ordered by distance from the client. When NameNode gives the DataNodes list, the API write a data block to the first node, which replicates the same block to the next one and so on. Here's a picture from The Hadoop Definitive Guide that explains this process.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;        &lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="8181-hdfs-write-operation.png" style="width: 920px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/23440i9C7A52A7314D6C81/image-size/medium?v=v2&amp;amp;px=400" role="button" title="8181-hdfs-write-operation.png" alt="8181-hdfs-write-operation.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Aug 2019 11:42:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Few-Questions-on-Hadoop-2-x-architecture/m-p/108000#M42464</guid>
      <dc:creator>ahadjidj</dc:creator>
      <dc:date>2019-08-19T11:42:41Z</dc:date>
    </item>
    <item>
      <title>Re: Few Questions on Hadoop 2.x architecture</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Few-Questions-on-Hadoop-2-x-architecture/m-p/108001#M42465</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/12833/gobisubramani.html" nodeid="12833"&gt;@Gobi Subramani&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Please read the link so you can understand easily:-&lt;/P&gt;&lt;P&gt;&lt;A href="https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/YARN.html" target="_blank"&gt;https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/YARN.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html" target="_blank"&gt;https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Every hadoop command internally calls a java utility for the further operations. &lt;CODE&gt;org.apache.hadoop.fs.FsShell&lt;/CODE&gt; provide command line access to a FileSystem. &lt;CODE&gt;hadoop fs -put&lt;/CODE&gt; internally calls the corresponding method from the above package.&lt;/P&gt;&lt;P&gt;to undersatnd the fsshell code please go thorugh the link:-&lt;/P&gt;&lt;P&gt;&lt;A href="http://grepcode.com/file/repo1.maven.org/maven2/com.ning/metrics.collector/1.1.0/org/apache/hadoop/fs/FsShell.java" target="_blank"&gt;http://grepcode.com/file/repo1.maven.org/maven2/com.ning/metrics.collector/1.1.0/org/apache/hadoop/fs/FsShell.java&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 01 Oct 2016 17:42:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Few-Questions-on-Hadoop-2-x-architecture/m-p/108001#M42465</guid>
      <dc:creator>ashneesharma88</dc:creator>
      <dc:date>2016-10-01T17:42:30Z</dc:date>
    </item>
    <item>
      <title>Re: Few Questions on Hadoop 2.x architecture</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Few-Questions-on-Hadoop-2-x-architecture/m-p/108002#M42466</link>
      <description>&lt;P&gt;Hi , &lt;/P&gt;&lt;P&gt;Thaks for your quick reply. Couple of pints need to clarify &lt;/P&gt;&lt;P&gt;
1. Application master is responsible for getting datablock info from NameNode , and creating container in the respective datanodes for processing the data &lt;/P&gt;&lt;P&gt;
2. It is also responsible for monitoring the task and in case it failed, then app master will start the container in different datanode&lt;/P&gt;</description>
      <pubDate>Sun, 02 Oct 2016 09:23:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Few-Questions-on-Hadoop-2-x-architecture/m-p/108002#M42466</guid>
      <dc:creator>gobi_subramani</dc:creator>
      <dc:date>2016-10-02T09:23:34Z</dc:date>
    </item>
  </channel>
</rss>

