Member since
05-11-2017
3
Posts
0
Kudos Received
0
Solutions
05-16-2017
09:20 AM
And in addition I have uploaded the wrong result files. So this time with the appropriate link. Extremely sorry 😞 Christian
... View more
05-15-2017
07:20 AM
Alas, the formatting of my post was lost, so here it is, redone: Hi, I am carrying out testDFSIO performance tests under MapReduce2 /
yarn in order to get a deeper understanding of yarn and its behavior
when it comes to the number of mappers and reducers - on a single node
sandbox running docker. I understand that the behavior should depend on the number of splits
of the input data and the configuration determining the number of
reduces. A) Number of mappers and reduces: My values are : hdfs getconf -confKey
yarn.nodemanager.resource.memory-mb 2250 hdfs getconf -confKey
yarn.nodemanager.resource.cpu-vcores 8 hdfs getconf
-confKeymapreduce.map.memory.mb 250 hdfs getconf
-confKeymapreduce.map.cpu.vcores 1 hdfs getconf
-confKeymapreduce.reduce.cpu.vcores 1 mapreduce.reduce.memory.mb 250
Following the formulas here, I am expecting up to 8 simultaneous mappers
and reducers. B) Input Splits: hdfs getconf -confKey dfs.blocksize 134217728 # 128MB hdfs getconf -confKey mapred.max.split.size - value missing so should be
a really big number in order to matter hdfs getconf -confKey
mapred.min.split.size 0 hdfs getconf -confKey dfs.replication 1 - as I
am on a sandbox In my case I would expect the split size to be the 128MB
- according to the formula result = max(min_split_size, min
(max_split_size,dfs_blksize)) Now I have set up DFSIO runs in order to test the behavior - always
reading and writing 10 GB of data. For example, the command to process
10 files of size 1GB is: $ hadoop jar hadoop-*test*.jar TestDFSIO
-read|write -nrFiles 10 -fileSize 1000 I have carried out several experiment with corresponding reads and
writes. cleaning up after each run: I am having problems in
understanding the patterns: In particular, I would expect the number of
splits change when the file sized exceeds split size. However, the
number of splits perfectily corresponds to the number of files, even if a
single file exceeds the split size of 128MB. I have collated a pdf to
clarify that point. I would expect the splits to change in rows that are
marked green. What am I getting wrong here? Thank you very much!
Christian
... View more
05-11-2017
01:35 PM
Hi,
I am carrying out testDFSIO performance tests under MapReduce2 / yarn in order to get a deeper understanding of yarn and its behavior when it comes to the number of mappers and reducers - on a single node sandbox running docker.
I understand that the behavior should depend on the number of splits of the input data and the configuration determining the number of reduces.
A) Number of mappers and reduces:
My values are :
hdfs getconf -confKey yarn.nodemanager.resource.memory-mb 2250
hdfs getconf -confKey yarn.nodemanager.resource.cpu-vcores 8
hdfs getconf -confKeymapreduce.map.memory.mb 250
hdfs getconf -confKeymapreduce.map.cpu.vcores 1
hdfs getconf -confKeymapreduce.reduce.cpu.vcores 1
mapreduce.reduce.memory.mb 250
Following the formulas here, I am expecting up to 8 simultaneous mappers and reducers.
https://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/
B) Input Splits:
hdfs getconf -confKey dfs.blocksize 134217728 # 128MB
hdfs getconf -confKey mapred.max.split.size - value missing so should be a really big number in order to matter
hdfs getconf -confKey mapred.min.split.size 0
hdfs getconf -confKey dfs.replication 1 - as I am on a sandbox
In my case I would expect the split size to be the 128MB - according to the formula
result = max(min_split_size, min (max_split_size,dfs_blksize))b
Now I have set up DFSIO runs in order to test the behavior - always reading and writing 10 GB of data.
For example, the command to process 10 files of size 1GB is:
$ hadoop jar hadoop-*test*.jar TestDFSIO -read|write -nrFiles 10 -fileSize 1000
I have carried out several experiment with corresponding reads and writes. cleaning up after each run:
I am having problems in understanding the patterns: In particular, I would expect the number of splits change when the file sized exceeds split size. However, the number of splits perfectily corresponds to the number of files, even if a single file exceeds the split size of 128MB. I have collated a pdf to clarify that point.
I would expect the splits to change in rows that are marked green.
What am I getting wrong here?
Thank you very much!
Christian
analyse-dfsio-write-test.pdf
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache YARN