Member since
05-26-2018
34
Posts
2
Kudos Received
0
Solutions
04-03-2019
12:08 PM
Hadoop supports two kinds of joins to join two or more data sets based on some column. The Map side join and the reduce side join. Map side join is usually used when one data set is large and the other data set is small. Whereas the Reduce side join can join both the large data sets. The Map side join is faster as it does not have to wait for all mappers to complete as in case of reducer. Hence reduce side join is slower. Map Side Join: · Sorted by the same key. · Equal number of partition. · All the records of the same key should be in same partition. Reduce Side Join: · Much flexible to implement. · There has to be custom WritableComparable with necessary function over ridden. · We need a custom partitioner. · Custom group comparator is required.
... View more
01-05-2019
07:23 AM
RAM.Because metadata information will be needing in every 3 seconds after each Heartbeat. So fast processing will require to process the metadata information. To fasten this kinetic momentum of metadata, Name Node used to stores it into RAM.
How we can change Replication factor when Data is already stored in HDFS hdfs-site.xml is used to configure HDFS . Changing the dfs.replication property in hdfs-site.xml will change the default replication for all files placed in HDFS.
or using hadoop fs shell "hadoop fs –setrep –w 3
... View more
11-22-2018
07:18 AM
In hadoop how one can increase replication factor to a desired value?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
10-31-2018
10:40 AM
In Hadoop how to restart NameNode or all the daemons?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
07-19-2018
10:33 AM
1 Kudo
What do you mean by cluster, single node cluster, and node?
... View more
Labels:
06-14-2018
11:11 AM
How to change the replication factor of data which is already stored in HDFS?
... View more
Labels:
- Labels:
-
Apache Hadoop
06-01-2018
11:19 AM
How is Data node failure is tackled in Hadoop.
... View more
Labels:
- Labels:
-
Apache Hadoop
05-30-2018
06:59 AM
How can we write or store data/file in Hadoop HDFS?
... View more
Labels:
- Labels:
-
Apache Hadoop
05-26-2018
10:46 PM
Below formula is used to calculate the cluster size of hadoop: H=crs/(1-i)
Where c=average compression ratio. This depends upon the type of compression used and size of the data. When no compression is used, c value will be 1.
R=replication factor. It is set to 3 by default in production cluster.
S = size of data to be moved to Hadoop. This could be a combination of historical data and incremental data. The incremental data can be daily for example and projected over a period of time (3 years for example).
i = intermediate factor. It is usually 1/3 or 1/4. Hadoop's working space dedicated to storing intermediate results of Map phase.
Example: With no compression i.e. c=1, a replication factor of 3, an intermediate factor of .25=1/4 H= 13S/(1-1/4)=3S/(3/4)=4S With the assumptions above, the Hadoop storage is estimated to be 4 times the size of the initial data size.
... View more