About patelharshali13

Chittu · ‎12-04-2019

Do I need to put the namenode in safe mode to execute this command? or I can execute this on live cluster? hadoop fs –setrep –w 3 -R /

bansal_himani13 · ‎04-03-2019

Hadoop supports two kinds of joins to join two or more data sets based on some column. The Map side join and the reduce side join. Map side join is usually used when one data set is large and the other data set is small. Whereas the Reduce side join can join both the large data sets. The Map side join is faster as it does not have to wait for all mappers to complete as in case of reducer. Hence reduce side join is slower. Map Side Join: · Sorted by the same key. · Equal number of partition. · All the records of the same key should be in same partition. Reduce Side Join: · Much flexible to implement. · There has to be custom WritableComparable with necessary function over ridden. · We need a custom partitioner. · Custom group comparator is required.

patelharshali13 · ‎02-27-2019

The users can be created using below steps: a)Get the information from user as to which machine is he working from. b)create the user in in OS first. c)Create the user in Hadoop by creating his home folder /user/username in Hadoop d)make sure that we have 777 permission for temp directory in HDFS e)using chown command change ownership from Hadoop to user for only his home directory so that he can write into only his directory and not other users. f)add his name into name node hdfs dfsadmin -refreshUserToGroupMappings G)If needed set a space limit for the user to limit the amount of data stored by him.hdfs dfsadmin -setSpaceQuota 50g /user/username

Shelton · ‎02-10-2019

@Dukool SHarma Any updates?

bansal_himani13 · ‎01-05-2019

RAM.Because metadata information will be needing in every 3 seconds after each Heartbeat. So fast processing will require to process the metadata information. To fasten this kinetic momentum of metadata, Name Node used to stores it into RAM. How we can change Replication factor when Data is already stored in HDFS hdfs-site.xml is used to configure HDFS . Changing the dfs.replication property in hdfs-site.xml will change the default replication for all files placed in HDFS. or using hadoop fs shell "hadoop fs –setrep –w 3

patelharshali13 · ‎12-27-2018

Sorting is carried out at the Map side. When all the map outputs have been copied, the reduce task moves into the sort phase i.e.maerging phase. which merges the map outputs, maintaining their sort ordering. This is done in rounds. For example, if there were 60 map outputs and the merge factor was 15 (the default, controlled by the mapreduce.task.io.sort.factor property, just like in the map’s merge), there would be four rounds. Each round would merge 15 files into 1, so at the end, there would be 4 intermediate files to be processed. This is done using a key-value pair.

patelharshali13 · ‎12-03-2018

When the mapper starts producing the intermediate output it does not directly write the data on the local disk. Rather it writers the data in memory and some sorting of the data (Quick Sort) happens for performance reasons. Each map task has a circular memory buffer which it writes the output to. By default, this circular buffer is of 100 MB. It can be modified by the parameter mapreduce.task.io.sort.mb. When the contents of the buffer reach a certain threshold size (MapReduce.map.sort.spill.percent, which has the default value 0.80, or 80%), a background thread will start to spill the contents to disk. Map outputs will continue to be written to the buffer while the spill takes place, but if the buffer fills up during this time, the map will block until the spill is complete.

sharmadukool136 · ‎10-27-2018

hdfs-site.xml – This file contains the configuration setting for HDFS daemons. hdfs-site.xml also specify default block replication and permission checking on HDFS. The three main hdfs-site.xml properties are: dfs.name.dir gives you the location where NameNode stores the metadata (FsImage and edit logs). And also specify where DFS should locate – on the disk or in the remote directory. dfs.data.dir gives the location of DataNodes where it stores the data. fs.checkpoint.dir is the directory on the file system. On which secondary NameNode stores the temporary images of edit logs. Then this EditLogs and FsImage will merge for backup.

zaki · ‎12-10-2018

Let's start with Hive and then HCatalog. Hive Layer for analyzing, querying and managing large datasets that reside in Hadoop various file systems ⇢ uses HiveQL (HQL) as processing engine ⇢ uses SerDes for serialization and deserialization ⇢ works best with huge volumes of data HCatalog Table and storage management layer for Hadoop ⇢ basically, the EDW system for Hadoop (it supports several file formats such as RCFile, CSV, JSON, SequenceFile, ORC) ⇢ is a sub-component of Hive, which enables ETL processes ⇢ tool for accessing metadata that reside in Hive Metastore ⇢ acts as an API to expose the metastore as REST interface to external tools such as Pig ⇢ uses WebHcat, a web server for engaging with the Hive Metastore I think the focus has to be made on how they complement each other rather than focusing on their differences. Documentation (3) This answer from @Scott Shaw is worth checking This slideshare presents the use cases and features of Hive and Hcatalog This direct graph from IBM shows how they use both layers in a batch job I hope this helps! 🙂

patelharshali13 · ‎10-15-2018

If we have small data set, Uber configuration is used for MapReduce. The Uber mode runs the map and reduce tasks within its own process and avoid overhead of launching and communicating with remote nodes.

Online	Offline
Last Visited	‎03-30-2019 11:50 AM

Member Since	‎05-18-2018 08:50 AM
Last Visited	‎03-30-2019 11:50 AM
Posts	43
Kudos received	3

Cloudera Community

Re: How can one increase replication factor to a d...

Re: Differentiate between Map Side join and Reduce...

Re: How to create user in hadoop?

Re: How to change / configure number of Mappers ?

Re: In which location NameNode stores its metadata...

Re: In Mapreduce how to sort intermediate output b...

Re: Explain process of spilling in Hadoop’s map re...

Re: What are the main hdfs-site.xml properties?

Re: How HCatalog is different from Hive?

Re: What is Uber mode?