About bansal_himani13

bansal_himani13 · ‎04-03-2019

Hadoop supports two kinds of joins to join two or more data sets based on some column. The Map side join and the reduce side join. Map side join is usually used when one data set is large and the other data set is small. Whereas the Reduce side join can join both the large data sets. The Map side join is faster as it does not have to wait for all mappers to complete as in case of reducer. Hence reduce side join is slower. Map Side Join: · Sorted by the same key. · Equal number of partition. · All the records of the same key should be in same partition. Reduce Side Join: · Much flexible to implement. · There has to be custom WritableComparable with necessary function over ridden. · We need a custom partitioner. · Custom group comparator is required.

bansal_himani13 · ‎01-05-2019

RAM.Because metadata information will be needing in every 3 seconds after each Heartbeat. So fast processing will require to process the metadata information. To fasten this kinetic momentum of metadata, Name Node used to stores it into RAM. How we can change Replication factor when Data is already stored in HDFS hdfs-site.xml is used to configure HDFS . Changing the dfs.replication property in hdfs-site.xml will change the default replication for all files placed in HDFS. or using hadoop fs shell "hadoop fs –setrep –w 3

bansal_himani13 · ‎11-22-2018

In hadoop how one can increase replication factor to a desired value?

bansal_himani13 · ‎10-31-2018

In Hadoop how to restart NameNode or all the daemons?

bansal_himani13 · ‎10-13-2018

What do you understand by Uber Mode?

bansal_himani13 · ‎07-19-2018

What do you mean by cluster, single node cluster, and node?

bansal_himani13 · ‎06-14-2018

How to change the replication factor of data which is already stored in HDFS?

bansal_himani13 · ‎06-01-2018

How is Data node failure is tackled in Hadoop.

bansal_himani13 · ‎05-30-2018

How can we write or store data/file in Hadoop HDFS?

bansal_himani13 · ‎05-26-2018

Below formula is used to calculate the cluster size of hadoop: H=crs/(1-i) Where c=average compression ratio. This depends upon the type of compression used and size of the data. When no compression is used, c value will be 1. R=replication factor. It is set to 3 by default in production cluster. S = size of data to be moved to Hadoop. This could be a combination of historical data and incremental data. The incremental data can be daily for example and projected over a period of time (3 years for example). i = intermediate factor. It is usually 1/3 or 1/4. Hadoop's working space dedicated to storing intermediate results of Map phase. Example: With no compression i.e. c=1, a replication factor of 3, an intermediate factor of .25=1/4 H= 13S/(1-1/4)=3S/(3/4)=4S With the assumptions above, the Hadoop storage is estimated to be 4 times the size of the initial data size.

Online	Offline
Last Visited	‎04-03-2019 12:07 PM

Member Since	‎05-26-2018 12:07 PM
Last Visited	‎04-03-2019 12:07 PM
Posts	34
Kudos received	2

Cloudera Community

Re: Differentiate between Map Side join and Reduce...

Re: In which location NameNode stores its metadata...

How can one increase replication factor to a desir...

How to restart NameNode or all the daemons in Hado...

What is Uber mode?

" What is cluster, single node cluster and node?"

How to change the replication factor of data which...

How name node tackles data node failure in Hadoop?

How data or file is written into HDFS?

Re: How to calculate the Hadoop cluster size?