Created 04-13-2018 12:42 PM
Hi everybody, i want to do DFSIO test on my hadoop cluster. I wanna know which parameter, show the number of parallel task on a data node? i mean what is the name of the parameter and where is it? i want to change it , and analyze the results. thank u very much.
i have read an article, in this link;
https://pdfs.semanticscholar.org/e874/ed1a4b120bb4caf5c2ecb440f49df7819d54.pdf
In this article, they use these parameters for DFSIO test:
Created 04-13-2018 10:56 PM
Like other MR jobs, you can specify the number of map tasks for the job via JobConf.setNumMapTasks(). However, this is only a hint and the actual number of spawned map tasks depends on the number of input splits. If you have set NumMapTasks=100 with 50 nodes, then the number of tasks in parallel per node is about 100/50 = 2 assuming the splits are evenly distributed across the nodes.
Created 04-13-2018 10:56 PM
Like other MR jobs, you can specify the number of map tasks for the job via JobConf.setNumMapTasks(). However, this is only a hint and the actual number of spawned map tasks depends on the number of input splits. If you have set NumMapTasks=100 with 50 nodes, then the number of tasks in parallel per node is about 100/50 = 2 assuming the splits are evenly distributed across the nodes.