Member since
07-31-2016
4
Posts
0
Kudos Received
0
Solutions
09-19-2016
11:06 PM
Could you paste the log messages here, looks like the port could already be in use. netstat -tulpn | grep <for the port you are using for thrift, defaul: 10000>
... View more
08-11-2016
04:26 PM
Thanks Ben, I agree to your point on performance ascepts that there are other things we consider while calculating this and it is a rough estimate. In production there are other factor which dictate heap of namenode like handler count, how many Datanodes can send block report in a given time frame(initDelay - parameter) and few others. Yes, I have tested that blocks, files, directories everything counts. I brought this up for the sake of hadoop learners that the statement "replication does not impact memory on Namenode" is incorrect. Thanks and Regards
... View more
08-02-2016
06:09 PM
Can someone please look into it.
... View more
07-31-2016
07:27 PM
I was refering to the documentation at: http://www.cloudera.com/documentation/enterprise/latest/topics/admin_nn_memory_config.html But, it says that the replication factor does not impact heap size on the namenode, which does not seem correct as the data structure uses about 16 bytes for each replica. The class BlockInfo has a Datanode Descriptor that maps the nodes on which the replica's of the block are present. or to be more precise block_files of a block. So, if I have 1 million files, with each file having 2 blocks per file, the calculations will be approximately as below: Memory Requirements: 1 million files = 150 bytes x 1 million = 143.0511474609375 MB 6 millions blocks( replication 3) = (150 bytes per block + 16 bytes per replica = 150 bytes + 2 x16 = 182 bytes)x 6 million = 1041.412353515625 MB Total = 1.15670259132385 GB Am i correct or I am missing something ?
... View more