Member since
05-26-2018
34
Posts
2
Kudos Received
0
Solutions
12-04-2019
10:17 AM
Do I need to put the namenode in safe mode to execute this command? or I can execute this on live cluster? hadoop fs –setrep –w 3 -R /
... View more
10-31-2018
11:25 AM
By following methods we can restart the NameNode:
You can stop the NameNode individually using /sbin/hadoop-daemon.sh stop namenode command. Then start the NameNode using /sbin/hadoop-daemon.sh start namenode. Use /sbin/stop-all.sh and the use /sbin/start-all.sh, command which will stop all the demons first. Then start all the daemons. The sbin directory inside the Hadoop directory store these script files.
... View more
10-15-2018
11:48 AM
If we have small data set, Uber configuration is used for MapReduce. The Uber mode runs the map and reduce tasks within its own process and avoid overhead of launching and communicating with remote nodes.
... View more
07-20-2018
04:52 PM
TaskTracker & JobTracker doesn't exist with YARN. The default replication factor is 3.
... View more
06-14-2018
12:13 PM
\r\ndfs.replication \r\n4 // To change the replication factor to 4\r\nBlock Replication \r\n \r\nhdfs-site.xml is used to configure HDFS. Changing the dfs.replication property in hdfs-site.xml will change the default replication for all files placed in HDFS.\r\nYou can also change the replication factor on a per-file basis using the Hadoop FS shell.\r\n[training@localhost ~]$ hadoop fs –setrep –w 4 /my/file\r\nAlternatively, you can change the replication factor of all the files under a directory.\r\n[training@localhost ~]$ hadoop fs –setrep –w 4 -R /my/dir\r\n"}" data-sheets-userformat="{"2":769,"3":[null,0],"11":4,"12":0}">The file that is loaded into HDFS which has a default Replication Factor of 3, is set in hdfs-site.xml file. The replication of that particular file would be 3, which means 3 copies of the block exists on the HDFS. To change the replication factor, open the hdfs-site.xml file. This file is usually found in the conf/ folder of the Hadoop installation directory. Change or add the following property to hdfs-site.xml: <property> <name>dfs.replication<name> <value>4<value> // To change the replication factor to 4 <description>Block Replication<description> <property> hdfs-site.xml is used to configure HDFS. Changing the dfs.replication property in hdfs-site.xml will change the default replication for all files placed in HDFS. You can also change the replication factor on a per-file basis using the Hadoop FS shell. [training@localhost ~]$ Hadoop fs –setrep –w 4 /my/file Alternatively, you can change the replication factor of all the files under a directory. [training@localhost ~]$ Hadoop fs –setrep –w 4 -R /my/dir
... View more
06-01-2018
12:07 PM
Each file to be stored in HDFS is split into numerous blocks and default block size being 128 MB. Each of these blocks are replicated in different data node, the default replication factor being 3. Data node continuously sends heart beat to name node. When the name node stop receiving heartbeat, it understands that particular data node is down. Using the metadata in its memory, name node identifies what all blocks are stored in this data node and identifies the other data nodes in which these blocks are stored. It also copies these blocks into some other data nodes to reestablish the replication factor. This is how, name node tackles data node failure.
... View more
05-30-2018
11:17 AM
HDFS follow Write once Read many models. So we cannot edit files already stored in HDFS, but we can append data by reopening the file. In Read-Write operation client first, interact with the NameNode. NameNode provides privileges so, the client can easily read and write data blocks into/from the respective datanodes. To write a file in HDFS, a client needs to interact with master i.e. namenode (master). Namenode provides the address of the datanodes (slaves) on which client will start writing the data. Client can directly write data on the datanodes, now datanode will create data write pipeline. The first datanode will copy the block to another datanode, which intern copy it to the third datanode. Once it creates the replicas of blocks, it sends back the acknowledgment. HDFS Data Write Pipeline Workflow a. The HDFS client sends a create request on DistributedFileSystem APIs. b. DistributedFileSystem makes an RPC call to the namenode to create a new file in the file system’s namespace. The namenode performs various checks to make sure that the file doesn’t already exist and that the client has the permissions to create the file. When these checks pass, then only the namenode makes a record of the new file; otherwise, file creation fails and the client is thrown an IOException. c. The DistributedFileSystem returns a FSDataOutputStream for the client to start writing data to. As the client writes data, DFSOutputStream splits it into packets, which it writes to an internal queue, called the data queue. The data queue is consumed by the DataStreamer, which is responsible for asking the namenode to allocate new blocks by picking a list of suitable datanodes to store the replicas. d. The list of datanodes forms a pipeline, and based on replication factor (usually 3), there are three nodes in the pipeline. The DataStreamer streams the packets to the first datanode in the pipeline, which stores the packet and forwards it to the second datanode in the pipeline. Similarly, the second datanode stores the packet and forwards it to the third (and last) datanode in the pipeline. e. DFSOutputStream also maintains an internal queue of packets that are waiting to be acknowledged by datanodes, called the ack queue. A packet is removed from the ack queue only when it has been acknowledged by the datanodes in the pipeline. Datanode sends the acknowledgment once required replicas are created (3 by default). Similarly, all the blocks are stored and replicated on the different datanodes, the data blocks are copied in parallel. f. When the client has finished writing data, it calls close() on the stream. g. This action flushes all the remaining packets to the datanode pipeline and waits for acknowledgments before contacting the namenode to signal that the file is complete. The namenode already knows which blocks the file is made up of, so it only has to wait for blocks to be minimally replicated before returning successfully. Read here more on Data read-write operation
... View more