Member since
12-15-2015
6
Posts
1
Kudos Received
0
Solutions
11-03-2016
12:48 PM
1 Kudo
While running a MR on the HDFS file, The # of mappers is based on the input split size. While it is usually true that
the input split size will match the block size with a text input file in hdfs.
( understand not always the case but most of the time) What will happen when I run a MR on the hbase
table as a input record. How does the input split size match with the hbase
table and how can I control the number of mapper while running MR on top of the
HBase table. Can someone guide me here. Using hbase 0.98.4 and hdoop 2.6 version.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache HBase
-
Apache YARN
08-03-2016
06:34 AM
Please find my cluster details in my first mail, also I am
using have 2 disk per node. Document says the recommended configuration as below, that’s
the same I have did in my newer configuration. yarn.scheduler.minimum-allocation-mb=1024 yarn.scheduler.maximum-allocation-mb=4096 yarn.nodemanager.resource.memory-mb=4096 mapreduce.map.memory.mb=512 mapreduce.map.java.opts=-Xmx409m mapreduce.reduce.memory.mb=1024 mapreduce.reduce.java.opts=-Xmx819m yarn.app.mapreduce.am.resource.mb=512 yarn.app.mapreduce.am.command-opts=-Xmx409m mapreduce.task.io.sort.mb=204 Thanks
... View more
08-03-2016
06:02 AM
Hi, I am running a cluster with 15 data node, 15 region server and
16 node manager (of course name node, Secondary name node, Hactive master, Resource
manager). All the machines are m3.large type machine basically so, 2 core processor
and 7.5GB of RAM. By default it allocates 32GB for the yarn memory and 1vcore.
Here my default configuration and it uses DefaultResourceCalculator. yarn.scheduler.minimum-allocation-mb: 682 yarn.scheduler.maximum-allocation-mb: 2048 yarn.nodemanager.resource.cpu-vcores :
1 yarn.nodemanager.resource.memory-mb: 2048 when I run a mapreduce job it takes about some 30min to
complete it till the time the yarn memory utilization was high, I thought that
the yarn memory was the issue. So I have doubled the size as below. yarn.scheduler.minimum-allocation-mb: 1024 yarn.scheduler.maximum-allocation-mb: 4096 yarn.nodemanager.resource.cpu-vcores :
1 yarn.nodemanager.resource.memory-mb: 4096 Now, yarn memory increased
from 32Gb to 64GB, but when I run a same mapreduce job with newer configuration
it takes me around 42 min though yarn memory all the 64GB the cluster seems
slower than before. So, I would like to understand the containers resource
allocation and why it’s slow down after I increased the memory also I would
like to see how many containers per cluster and per node (any calculation). Please
suggest me with the recommended configuration in this case. Thanks Arun
... View more
Labels:
- Labels:
-
Apache YARN
07-22-2016
10:39 AM
Thanks for the additional info. I was really curious to understand
about the NameNode disk utilization before. Since my cluster load above 1GB of file always, so it’s OK
to have 256MB of block size now or I may improve it later on. Right now I am loading the data as a text file. So the
compression has to come from the local file system (tar or gz) or is there any
default compression technique which is available in hdfs native command(setup)
? I know few of the native compression technique is available for
HBase, what would the better compression algorithms when storing the text data. I curious to understand about minimum disk utilization and
better performance. Thanks.
... View more
07-21-2016
05:01 AM
HI, Let’s assume 10GB of file to store in HDFS. Block size of
the cluster is 256MB, replication factor as 3 and I am using 3 datanodes. Now, this 10GB of data requires how much space in every Datanode,
NameNode and secondary NameNode. ( I am really interesting to understand about
space utilization of NameNode and Secondary NameNode) Also how much space required to store the same data in
HBase. Thanks
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache HBase