About satyap

satyap · ‎02-10-2020

Hi, Can anyone suggest if we will implement hadoop in two different data center with same network Then it will impact the performance or not? We are distributing the master nodes and data nodes in two different data center to overcome the down time. However both the data center in same network, so it will impact the performance or not? Satya

satyap · ‎03-28-2017

@ Benjamin Leonhardi Why sorting is written before shuffling? I think sorting always happen after the shuffling. As there is already combiner to combine(sort) the output on single node. I think when all intermediated data collected using shuffling then sorting is use to make one single input file, which will use by reducer.

satyap · ‎02-28-2017

@Artem Ervits Hi Artem, Thanks for your reply. I did the same thing and I am able to get back the data. The most surprising thing I got, I have created 2-3 tables even with different schema it's showing the same data whatever the data was in an old table for extra column it's showing the null. So every time when we have to create an external table we should give the different directory path?

satyap · ‎02-27-2017

suppose I have dropped an external table(EMP) the table was stored at /user/hive/satya/. As we know the metadata will be deleted if we will drop the external table and actual data will be there. So my Question is that how we can restore the external table(EMP) how we will get the data. would anyone give me the steps need to perform to get the data?

satyap · ‎02-22-2017

suppose if we want to run 1000 map task then we need 1000 container or we can run the map task less than 1000?

satyap · ‎02-22-2017

1. How multiple reducers writing the output ? can multiple reducers write the output in a single output file? Or we have to write an intermediate reducer to do so? i just want to know that how we can get a single output file from multiple reducers. 2. can map and reduce task run in the same container or more than one map task or reduce task can run in the same container? if yes then how? How container is assigning for map and reduce task?

satyap · ‎02-21-2017

@mqureshi, Thanks a lot! for your explanation. I am a little bit confuse on the no of map task and no reduce task logic and resource management in hadoop. As you have written no of reducer can we determined by mapreduce.job.reduces but if we have given more no of reducer then also job will run and resource mangaer will check the resource availability if the resource is available then job will run as no of requested reducers. am i correct? This configuration parameter is just a recommendation for yarn.finall resource manager will take the decision with reference of the available resource. The most worrying thing how programmer used to decide how many no of reducer they need to proceed the file. whether they have to calculate it every time before job submission?

satyap · ‎02-21-2017

Hi, I know the no of map task is basically determined by no of input files and no of map-splits of these input files. So if we want to process the 200 files with same block size or even more so we need 200 map-task to process these files and in the case of 1k files we need 1k map task. How we set the number of reducer for these files apart from setReducenum() or mapreduce.job.task configuration is there any algorithm or logic like hashkey to get the no of reducer. Secondary ,i want to know that how no of container and require resource is requested by AM to resource manager. suppose if there is 2gb ram is available in a nodemanager and we submitted a job with 3gb ram then how job will run or it will not run. If you can give the exact flow with logic from map task to reduce task and till the container assignment then it will be really helpful for me.

satyap · ‎02-10-2017

@abokor, Hi, Thanks man! My question was suppose there is a file with the name of A.txt and it has created before 10 years and there is no change from last 10 years in this file. Now if any client wants to access the same file. so from where it will get the metadata (block location, permission, name,etc) from RAM only(since NN store all metadata in RAM)? or somewhere else ?

satyap · ‎02-10-2017

Hi Abokor, Thanks a lot for your reply, but still I have one query as per your explanation edits log and fsimage store changes only from the last checkpoint.So if any user wants to get the file information or any data which is 10 years old then how it will get the metadata information for that particular file or data? It will access the metadata (block location,etc) only from RAM or it will use some other process?

Online	Offline
Last Visited	‎02-17-2020 07:36 AM

Member Since	‎08-24-2017 05:53 AM
Last Visited	‎02-17-2020 07:36 AM
Posts	24
Kudos received	2

Cloudera Community

Hadoop in multi datacenter

Re: What is the difference between Partitioner, Co...

Re: how we can get the data from dropped external ...

how we can get the data from dropped external tabl...

Re: How multiple reducer writing the output ? can ...

How multiple reducer writing the output ? can mult...

Re: How number of map task and number of reduce ta...

How number of map task and number of reduce task d...

Re: Actual use of FSimage and Edits log?

Re: Actual use of FSimage and Edits log?