Created on 12-24-2016 09:11 AM
while investigating a performance issue with topology assignments, I figured out these high-level steps which storm uses for topologies assignments.
1. for backward compatibility for the old topologies Backward ClientJarTransformerRunner starts and invoke StormShadeTransformer which transformed the jar write into /tmp/<some_random_string>.jar
2. StormSubmitter start uploading topology jar to nimbus inbox using NimbusClient
o.a.s.StormSubmitter - Uploading topology jar /tmp/27ed633ac9aa11e6a850fa163e19dd06.jar to assigned location: /hadoop/storm/nimbus/inbox/stormjar-b1eca4ae-d021-4e93-aaf1-986c9a5772ad.jar Start uploading file '/tmp/27ed633ac9aa11e6a850fa163e19dd06.jar' to '/hadoop/storm/nimbus/inbox/stormjar-b1eca4ae-d021-4e93-aaf1-986c9a5772ad.jar' o.a.s.StormSubmitter - Successfully uploaded topology jar to assigned location: /hadoop/storm/nimbus/inbox/stormjar-b1eca4ae-d021-4e93-aaf1-986c9a5772ad.jar
3. nimbus client submit topology to Nimbus using thrift call
o.a.s.StormSubmitter - Submitting topology wordcount in distributed mode with conf {"storm.zookeeper.topology.auth.scheme":"digest","storm.zookeeper.topology.auth.payload":"-5184467572710101881:-6542959882697852797","topology.workers":3,"topology.debug":true} o.a.s.StormSubmitter - Finished submitting topology: wordcount
4. nimbus received topology submission from zookeeper
o.a.s.d.nimbus [INFO] Received topology submission for wordcount with conf {"topology.max.task.parallelism" nil, "topology.submitter.principal" "", "topology.acker.executors" nil, "topology.eventlogger.executors" 0, "topology.workers" 3, "topology.debug" true, "storm.zookeeper.superACL" nil, "topology.users" (), "topology.submitter.user" "storm", "topology.kryo.register" nil, "topology.kryo.decorators" (), "storm.id" "wordcount-1-1482564367", "topology.name" "wordcount"}
5.nimbus create assignments in zookeeper and set a watch
2016-12-24 07:26:08.696 o.a.s.d.nimbus [INFO] Setting new assignment for topology id wordcount-1-1482564367: #org.apache.storm.daemon.common.Assignment{:master-code-dir "/hadoop/storm", :node->host {"3cb18e51-aa66-424c-8165-e9101ab134bb" "rkk3.hdp.local"}, :executor->node+port {[8 8] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6701], [12 12] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6701], [2 2] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6701], [7 7] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6700], [22 22] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6701], [3 3] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6700], [24 24] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6701], [1 1] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6700], [18 18] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6701], [6 6] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6701], [28 28] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6701], [20 20] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6701], [9 9] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6700], [23 23] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6700], [11 11] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6700], [16 16] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6701], [13 13] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6700], [19 19] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6700], [21 21] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6700], [5 5] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6700], [27 27] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6700], [29 29] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6700], [26 26] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6701], [10 10] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6701], [14 14] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6701], [4 4] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6701], [15 15] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6700], [25 25] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6700], [17 17] ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6700]}, :executor->start-time-secs {[8 8] 1482564368, [12 12] 1482564368, [2 2] 1482564368, [7 7] 1482564368, [22 22] 1482564368, [3 3] 1482564368, [24 24] 1482564368, [1 1] 1482564368, [18 18] 1482564368, [6 6] 1482564368, [28 28] 1482564368, [20 20] 1482564368, [9 9] 1482564368, [23 23] 1482564368, [11 11] 1482564368, [16 16] 1482564368, [13 13] 1482564368, [19 19] 1482564368, [21 21] 1482564368, [5 5] 1482564368, [27 27] 1482564368, [29 29] 1482564368, [26 26] 1482564368, [10 10] 1482564368, [14 14] 1482564368, [4 4] 1482564368, [15 15] 1482564368, [25 25] 1482564368, [17 17] 1482564368}, :worker->resources {["3cb18e51-aa66-424c-8165-e9101ab134bb" 6700] [0.0 0.0 0.0], ["3cb18e51-aa66-424c-8165-e9101ab134bb" 6701] [0.0 0.0 0.0]}}
6. supervisors got watchevent and read from the assignments
2016-12-24 07:26:09.577 o.a.s.d.supervisor [DEBUG] All assignment: {6701 {:storm-id "wordcount-1-1482564367", :executors ([8 8] [12 12] [2 2] [22 22] [24 24] [18 18] [6 6] [28 28] [20 20] [16 16] [26 26] [10 10] [14 14] [4 4]), :resources [0.0 0.0 0.0]}, 6700 {:storm-id "wordcount-1-1482564367", :executors ([7 7] [3 3] [1 1] [9 9] [23 23] [11 11] [13 13] [19 19] [21 21] [5 5] [27 27] [29 29] [15 15] [25 25] [17 17]), :resources [0.0 0.0 0.0]}}
7. supervisors start downloading the topology jar
after download it start launching workers
2016-12-24 07:26:12.728 o.a.s.d.supervisor [INFO] Launching worker with assignment {:storm-id "wordcount-1-1482564367", :executors [[7 7] [3 3] [1 1] [9 9] [23 23] [11 11] [13 13] [19 19] [21 21] [5 5] [27 27] [29 29] [15 15] [25 25] [17 17]], :resources #object[org.apache.storm.generated.WorkerResources 0x28e35c1e "WorkerResources(mem_on_heap:0.0, mem_off_heap:0.0, cpu:0.0)"]} for this supervisor 3cb18e51-aa66-424c-8165-e9101ab134bb on port 6700 with id ac690504-6b52-4c88-a5bd-50fa78992368