Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

how to set and change the number of parallel tasks in a hadoop Hortonworks data node?

avatar
New Contributor

Hi everybody, i want to do DFSIO test on my hadoop cluster. I wanna know which parameter, show the number of parallel task on a data node? i mean what is the name of the parameter and where is it? i want to change it , and analyze the results. thank u very much.

i have read an article, in this link;

https://pdfs.semanticscholar.org/e874/ed1a4b120bb4caf5c2ecb440f49df7819d54.pdf

In this article, they use these parameters for DFSIO test:

  • BufferSize: 1000000 bytes
  • Replication factor: 3
  • Number of tasks in parallel by node: 2
  • Block size: 64 MB
  • what is the "Number of tasks in parallel by node" , and where can i set it in hortonworks hadoop?
  • 1 ACCEPTED SOLUTION

    avatar
    Expert Contributor

    Like other MR jobs, you can specify the number of map tasks for the job via JobConf.setNumMapTasks(). However, this is only a hint and the actual number of spawned map tasks depends on the number of input splits. If you have set NumMapTasks=100 with 50 nodes, then the number of tasks in parallel per node is about 100/50 = 2 assuming the splits are evenly distributed across the nodes.

    View solution in original post

    1 REPLY 1

    avatar
    Expert Contributor

    Like other MR jobs, you can specify the number of map tasks for the job via JobConf.setNumMapTasks(). However, this is only a hint and the actual number of spawned map tasks depends on the number of input splits. If you have set NumMapTasks=100 with 50 nodes, then the number of tasks in parallel per node is about 100/50 = 2 assuming the splits are evenly distributed across the nodes.