Member since
08-24-2017
24
Posts
2
Kudos Received
0
Solutions
02-12-2020
01:20 PM
2 Kudos
Hi @satyap The Cloudera Distribution of Hadoop (CDH) can be deployed across data centers. Please take a look at this section of the Cloudera Bare Metal Reference Architecture: Appendix A: Spanning Multiple Data Centers Please pay close attention to the networking section. I generally do not recommend this topology because of the latency challenges that are introduced going across data centers. Regards, Steve
... View more
02-28-2017
11:06 AM
@satya gaurav
Directory path is where, your hive data resides, if you give empty folder then table will be empty, so you cannot change if you want the same data to be loaded to the table. The columns with will null is due to the number columns you defined in schema may more than the delimited data.
... View more
02-22-2017
03:23 PM
@satya gaurav Please find the details below: 1. How multiple reducers writing the output ? can multiple reducers write the output in a single output file? Or we have to write an intermediate reducer to do so? i just want to know that how we can get a single output file from multiple reducers. ->By default each reducer will generate a separate output file like part-0000 and this output will be stored in HDFS. if we want to merge all the reducers output to single file, then explicitly we have write our own code using MultipleOutputs or using hadoop -fs getmerge command 2. can map and reduce task run in the same container or more than one map task or reduce task can run in the same container?if yes then how? How container is assigning for map and reduce task? ->Yes map and reduce task run in the same container but not parallel. In mapredce V1 we have fixed mappers and reducer slots and we have to run map tasks in mapper slots and reducer tasks in reducer slots only. But in Mapreduce V2 we have option to run either map/reduce tasks in either mapper/reducer container. -> We can't run more than 1 map/reduce tasks at a time in the same container
... View more
02-21-2017
02:50 PM
1 Kudo
@satya gaurav number of reducers is determined exactly by mapreduce.job.reduces. This is not just a recommendation. If you have specified a higher number of reducers, container allocation is still done based on queue size for that application. This is determined by your scheduler. Just because you request more than you should doesn't mean that those resources will be allocated. Your reducers will be waiting in queue until other complete. To get more details, you need to understand schedulers (capacity scheduler to be precise). Minimum container size is given by yarn.scheduler.minimum-allocation-mb (your request for less than this value will still result in a container with this minimum value and not a value you specify if its less than this). Similarly there is an upper limit given by yarn.scheduler.maximum-allocation-mb. Guess what happens if you request more than this? You don't get it. You get assigned this value if you request memory more than this. There are similar settings for core. This is at the cluster level. For each node, containers are allocated by Node Manager which of course is asking Resource Manager to do its job. yarn.nodemanager.resource.memory-mb is how much memory a container will allocate and yarn.nodemanager.resource.cpu-vcores is for CPU
... View more
04-23-2018
11:54 PM
https://community.hortonworks.com/questions/82148/actual-use-of-fsimage-and-edits-log.html , Now if the a.txt was changed to b.txt and during the same time a checkpoint occurred and a new fsimage with the information of b.txt exists, by this time the editlogs are truncated and now you dont do any change. So this would mean that the new fsimage has info for b.txt(but this is not in the NN memory as of now, the NN continues to have a.txt in the fsimage it has loaded in the start) but the editlog has no information about it now. So how would the NN recognize this file ?
... View more