About sharmadukool136

sharmadukool136 · ‎01-30-2019

How can I change / configure number of Mappers ?

sharmadukool136 · ‎12-26-2018

How to sort intermediate output based on values In MapReduce?

sharmadukool136 · ‎12-03-2018

What is the process of spilling in Hadoop’s map reduce program?

sharmadukool136 · ‎10-27-2018

hdfs-site.xml – This file contains the configuration setting for HDFS daemons. hdfs-site.xml also specify default block replication and permission checking on HDFS. The three main hdfs-site.xml properties are: dfs.name.dir gives you the location where NameNode stores the metadata (FsImage and edit logs). And also specify where DFS should locate – on the disk or in the remote directory. dfs.data.dir gives the location of DataNodes where it stores the data. fs.checkpoint.dir is the directory on the file system. On which secondary NameNode stores the temporary images of edit logs. Then this EditLogs and FsImage will merge for backup.

sharmadukool136 · ‎10-24-2018

HCatalog is a table and storage management layer for Hadoop that enables users with different data processing tools — Pig, MapReduce — to more easily read and write data. HCatalog’s table abstraction presents users with a relational view of data in the Hadoop distributed file system (HDFS) and ensures that users need not worry about where or in what format their data is stored HCatalog supports reading and writing files in any format for which a SerDe (serializer-deserializer) can be written. By default, HCatalog supports RCFile, CSV, JSON, and SequenceFile, and ORC file formats. To use a custom format, you must provide the InputFormat, OutputFormat, and SerDe.

sharmadukool136 · ‎10-16-2018

What do mean by SafemodeProblem and how User come out of Safe mode in HDFS?

sharmadukool136 · ‎08-13-2018

\r\n dfs.block.size>\r\n 134217728\r\n \r\n"}" data-sheets-userformat="{"2":769,"3":[null,0],"11":4,"12":0}">Directly we cannot change the number of mappers for a MapReduce job but by changing the block size we can increase or decrease the number of mappers. As we know Number of input splits = Number of mappers Example If we are having 1TB of input file and the block size for the HDFS is 128MB then number of input splits are (1024/128) 8 input splits so the mappers for the job allotted are 8. If we reduce the block size from 128MB to 64Mb then 1TB of Input file will be divided in to (1024/64) 16 Input splits and the number of mappers also be 16. The block size can be changed in hdfs-site.xml by changing the value of dfs.block.size <property> <name>dfs.block.size> <value>134217728</value> </property>

sharmadukool136 · ‎07-18-2018

HDFS Block- Block is a continuous location on the hard drive where data is stored. In general, FileSystem stores data as a collection of blocks. In the same way, HDFS stores each file as blocks. The Hadoop application is responsible for distributing the data block across multiple nodes. Input Split in Hadoop- The data to be processed by an individual Mapper is represented by InputSplit. The split is divided into records and each record (which is a key-value pair) is processed by the map. The number of map tasks is equal to the number of InputSplits. Initially, the data for MapReduce task is stored in input files and input files typically reside in HDFS. InputFormat is used to define how these input files are split and read. InputFormat is responsible for creating InputSplit. MapReduce InputSplit vs Blocks in Hadoop InputSplit vs Block Size in Hadoop- • Block – The default size of the HDFS block is 128 MB which we can configure as per our requirement. All blocks of the file are of the same size except the last block, which can be of same size or smaller. The files are split into 128 MB blocks and then stored into Hadoop FileSystem. • InputSplit – By default, split size is approximately equal to block size. InputSplit is user defined and the user can control split size based on the size of data in MapReduce program. Data Representation in Hadoop Blocks vs InputSplit- • Block – It is the physical representation of data. It contains a minimum amount of data that can be read or write. • InputSplit – It is the logical representation of data present in the block. It is used during data processing in MapReduce program or other processing techniques. InputSplit doesn’t contain actual data, but a reference to the data

sharmadukool136 · ‎07-13-2018

What do you mean by rack awareness in HDFS?

sharmadukool136 · ‎06-01-2018

Each file to be stored in HDFS is split into numerous blocks and default block size being 128 MB. Each of these blocks are replicated in different data node, the default replication factor being 3. Data node continuously sends heart beat to name node. When the name node stop receiving heartbeat, it understands that particular data node is down. Using the metadata in its memory, name node identifies what all blocks are stored in this data node and identifies the other data nodes in which these blocks are stored. It also copies these blocks into some other data nodes to reestablish the replication factor. This is how, name node tackles data node failure.

Online	Offline
Last Visited	‎03-28-2019 12:25 PM

Member Since	‎05-18-2018 06:08 AM
Last Visited	‎03-28-2019 12:25 PM
Posts	50
Kudos received	3

Cloudera Community

How to change / configure number of Mappers ?

In Mapreduce how to sort intermediate output based...

Explain process of spilling in Hadoop’s map reduce...

Re: What are the main hdfs-site.xml properties?

Re: How HCatalog is different from Hive?

What is SafemodeProblem ? How User come out of Saf...

Re: Can we change no of Mappers for a MapReduce jo...

Re: Difference between input split and block in Ha...

What is rack awareness in HDFS?

Re: How name node tackles data node failure in Had...