Member since
05-18-2018
50
Posts
3
Kudos Received
0
Solutions
03-28-2019
12:30 PM
Block It is the physical representation of data. It contains a minimum amount of data that can be read or write. The default size of the HDFS block is 128 MB which we can configure as per our requirement. All blocks of the file are of the same size except the last block, which can be of same size or smaller. The files are split into 128 MB blocks and then stored into Hadoop filesystem. InputSplit It is the logical representation of data present in the block. It is used during data processing in the MapReduce program or other processing techniques. InputSplit doesn’t contain actual data, but a reference to the data. By default, split size is approximately equal to block size. InputSplit is user-defined and the user can control split size based on the size of data in the MapReduce program.
... View more
03-05-2019
12:22 PM
Hadoop MapReduce uses a key-value pair to process the data in an efficient manner. The MapReduce concept is actually derived from Google white papers which uses this concept. Key-value pairs are not part of the input data, but rather the input data is split in the form of key and value to be processed in the mapper.
... View more
02-08-2019
12:42 PM
Once a map reduce program is built a driver class has to be created that will be submitted to the cluster. For this, we create the object of the JobConf class. One of the properties of this object is setMapperClass. Conf.setMapperClass sets the mapper class in the driver. It helps the driver class to get the details like reading data and generating key-Value pairs out of the mapper.
... View more
02-04-2019
10:10 AM
The NameNode only stores the metadata of blocks in the DataNode. The NameNode utilizes 150 bytes of memory per block.
Generally, it is recommended to allocate 1 GB of memory (RAM) for every 1 million blocks.
Based on the above recommendation we can determine the requirement of NameNode while installing the Hadoop system, by considering the size of the cluster. Since the NameNode stores only the metadata it is rare that such requirement to upgrade the NameNode arise. However, there is a possibility of vertical scalability for NameNode.
... View more
01-30-2019
12:12 PM
How can I change / configure number of Mappers ?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
01-23-2019
12:20 PM
The small files are those which are significantly smaller than the default HDFS file size.i.e;64 MB.HDFS can’t handle these small files efficiently. If store 1 million files on HDFS, it will utilize a lot of Name node space to store the metadata of files, which will make the processing very slow.
... View more
01-16-2019
12:10 PM
If I create folder will there be metadata created in Hadoop?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
01-03-2019
09:42 AM
1 Kudo
Identity mapper and reducer are default mapper and reducer which are picked up by the map-reduce framework when no mapper or reducer class is defined in driver class. They do not do any type of processing in the data and write the value to the output which it gets from the input.
... View more
12-26-2018
11:20 AM
1 Kudo
How to sort intermediate output based on values In MapReduce?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
12-14-2018
09:09 AM
How many combiners are used for a map-reduce?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive