Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how to set and change the number of parallel tasks in a hadoop Hortonworks data node?

Solved Go to solution

how to set and change the number of parallel tasks in a hadoop Hortonworks data node?

New Contributor

Hi everybody, i want to do DFSIO test on my hadoop cluster. I wanna know which parameter, show the number of parallel task on a data node? i mean what is the name of the parameter and where is it? i want to change it , and analyze the results. thank u very much.

i have read an article, in this link;

https://pdfs.semanticscholar.org/e874/ed1a4b120bb4caf5c2ecb440f49df7819d54.pdf

In this article, they use these parameters for DFSIO test:

  • BufferSize: 1000000 bytes
  • Replication factor: 3
  • Number of tasks in parallel by node: 2
  • Block size: 64 MB
  • what is the "Number of tasks in parallel by node" , and where can i set it in hortonworks hadoop?
  • 1 ACCEPTED SOLUTION

    Accepted Solutions
    Highlighted

    Re: how to set and change the number of parallel tasks in a hadoop Hortonworks data node?

    Rising Star

    Like other MR jobs, you can specify the number of map tasks for the job via JobConf.setNumMapTasks(). However, this is only a hint and the actual number of spawned map tasks depends on the number of input splits. If you have set NumMapTasks=100 with 50 nodes, then the number of tasks in parallel per node is about 100/50 = 2 assuming the splits are evenly distributed across the nodes.

    1 REPLY 1
    Highlighted

    Re: how to set and change the number of parallel tasks in a hadoop Hortonworks data node?

    Rising Star

    Like other MR jobs, you can specify the number of map tasks for the job via JobConf.setNumMapTasks(). However, this is only a hint and the actual number of spawned map tasks depends on the number of input splits. If you have set NumMapTasks=100 with 50 nodes, then the number of tasks in parallel per node is about 100/50 = 2 assuming the splits are evenly distributed across the nodes.

    Don't have an account?
    Coming from Hortonworks? Activate your account here