Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

how to set and change the number of parallel tasks in a hadoop Hortonworks data node?

avatar
New Member

Hi everybody, i want to do DFSIO test on my hadoop cluster. I wanna know which parameter, show the number of parallel task on a data node? i mean what is the name of the parameter and where is it? i want to change it , and analyze the results. thank u very much.

i have read an article, in this link;

https://pdfs.semanticscholar.org/e874/ed1a4b120bb4caf5c2ecb440f49df7819d54.pdf

In this article, they use these parameters for DFSIO test:

  • BufferSize: 1000000 bytes
  • Replication factor: 3
  • Number of tasks in parallel by node: 2
  • Block size: 64 MB
  • what is the "Number of tasks in parallel by node" , and where can i set it in hortonworks hadoop?
  • 1 ACCEPTED SOLUTION

    avatar
    Expert Contributor

    Like other MR jobs, you can specify the number of map tasks for the job via JobConf.setNumMapTasks(). However, this is only a hint and the actual number of spawned map tasks depends on the number of input splits. If you have set NumMapTasks=100 with 50 nodes, then the number of tasks in parallel per node is about 100/50 = 2 assuming the splits are evenly distributed across the nodes.

    View solution in original post

    1 REPLY 1

    avatar
    Expert Contributor

    Like other MR jobs, you can specify the number of map tasks for the job via JobConf.setNumMapTasks(). However, this is only a hint and the actual number of spawned map tasks depends on the number of input splits. If you have set NumMapTasks=100 with 50 nodes, then the number of tasks in parallel per node is about 100/50 = 2 assuming the splits are evenly distributed across the nodes.