About arunkumar_d

arunkumar_d · ‎11-03-2016

While running a MR on the HDFS file, The # of mappers is based on the input split size. While it is usually true that the input split size will match the block size with a text input file in hdfs. ( understand not always the case but most of the time) What will happen when I run a MR on the hbase table as a input record. How does the input split size match with the hbase table and how can I control the number of mapper while running MR on top of the HBase table. Can someone guide me here. Using hbase 0.98.4 and hdoop 2.6 version.

arunkumar_d · ‎08-03-2016

Please find my cluster details in my first mail, also I am using have 2 disk per node. Document says the recommended configuration as below, that’s the same I have did in my newer configuration. yarn.scheduler.minimum-allocation-mb=1024 yarn.scheduler.maximum-allocation-mb=4096 yarn.nodemanager.resource.memory-mb=4096 mapreduce.map.memory.mb=512 mapreduce.map.java.opts=-Xmx409m mapreduce.reduce.memory.mb=1024 mapreduce.reduce.java.opts=-Xmx819m yarn.app.mapreduce.am.resource.mb=512 yarn.app.mapreduce.am.command-opts=-Xmx409m mapreduce.task.io.sort.mb=204 Thanks

arunkumar_d · ‎08-03-2016

Hi, I am running a cluster with 15 data node, 15 region server and 16 node manager (of course name node, Secondary name node, Hactive master, Resource manager). All the machines are m3.large type machine basically so, 2 core processor and 7.5GB of RAM. By default it allocates 32GB for the yarn memory and 1vcore. Here my default configuration and it uses DefaultResourceCalculator. yarn.scheduler.minimum-allocation-mb: 682 yarn.scheduler.maximum-allocation-mb: 2048 yarn.nodemanager.resource.cpu-vcores : 1 yarn.nodemanager.resource.memory-mb: 2048 when I run a mapreduce job it takes about some 30min to complete it till the time the yarn memory utilization was high, I thought that the yarn memory was the issue. So I have doubled the size as below. yarn.scheduler.minimum-allocation-mb: 1024 yarn.scheduler.maximum-allocation-mb: 4096 yarn.nodemanager.resource.cpu-vcores : 1 yarn.nodemanager.resource.memory-mb: 4096 Now, yarn memory increased from 32Gb to 64GB, but when I run a same mapreduce job with newer configuration it takes me around 42 min though yarn memory all the 64GB the cluster seems slower than before. So, I would like to understand the containers resource allocation and why it’s slow down after I increased the memory also I would like to see how many containers per cluster and per node (any calculation). Please suggest me with the recommended configuration in this case. Thanks Arun

arunkumar_d · ‎07-22-2016

Thanks for the additional info. I was really curious to understand about the NameNode disk utilization before. Since my cluster load above 1GB of file always, so it’s OK to have 256MB of block size now or I may improve it later on. Right now I am loading the data as a text file. So the compression has to come from the local file system (tar or gz) or is there any default compression technique which is available in hdfs native command(setup) ? I know few of the native compression technique is available for HBase, what would the better compression algorithms when storing the text data. I curious to understand about minimum disk utilization and better performance. Thanks.

arunkumar_d · ‎07-21-2016

HI, Let’s assume 10GB of file to store in HDFS. Block size of the cluster is 256MB, replication factor as 3 and I am using 3 datanodes. Now, this 10GB of data requires how much space in every Datanode, NameNode and secondary NameNode. ( I am really interesting to understand about space utilization of NameNode and Secondary NameNode) Also how much space required to store the same data in HBase. Thanks

Online	Offline
Last Visited	‎03-12-2019 12:51 PM

Member Since	‎12-15-2015 12:02 PM
Last Visited	‎03-12-2019 12:51 PM
Posts	6
Kudos received	1

Cloudera Community

MapReduce performance on the HBase input table.

Re: Yarn memory utilization.

Yarn memory utilization.

Re: How much actual space required to store 10GB t...

How much actual space required to store 10GB to HD...