Support Questions

Find answers, ask questions, and share your expertise

how to set and change the number of parallel tasks in a hadoop Hortonworks data node?

avatar
New Contributor

Hi everybody, i want to do DFSIO test on my hadoop cluster. I wanna know which parameter, show the number of parallel task on a data node? i mean what is the name of the parameter and where is it? i want to change it , and analyze the results. thank u very much.

i have read an article, in this link;

https://pdfs.semanticscholar.org/e874/ed1a4b120bb4caf5c2ecb440f49df7819d54.pdf

In this article, they use these parameters for DFSIO test:

  • BufferSize: 1000000 bytes
  • Replication factor: 3
  • Number of tasks in parallel by node: 2
  • Block size: 64 MB
  • what is the "Number of tasks in parallel by node" , and where can i set it in hortonworks hadoop?
  • 1 ACCEPTED SOLUTION

    avatar
    Expert Contributor

    Like other MR jobs, you can specify the number of map tasks for the job via JobConf.setNumMapTasks(). However, this is only a hint and the actual number of spawned map tasks depends on the number of input splits. If you have set NumMapTasks=100 with 50 nodes, then the number of tasks in parallel per node is about 100/50 = 2 assuming the splits are evenly distributed across the nodes.

    View solution in original post

    1 REPLY 1

    avatar
    Expert Contributor

    Like other MR jobs, you can specify the number of map tasks for the job via JobConf.setNumMapTasks(). However, this is only a hint and the actual number of spawned map tasks depends on the number of input splits. If you have set NumMapTasks=100 with 50 nodes, then the number of tasks in parallel per node is about 100/50 = 2 assuming the splits are evenly distributed across the nodes.