Created 03-03-2018 12:18 AM
I know that it sounds a bit crazy, but I am a data scientist 🙂 and also my preferred language is still Python. I use it with Spark, but I will like to be able to implement some smart models within a Storm topology. We did not adopt Spark streaming and Storm is still working best with Kafka. Any pointers on how to start this?
Created 03-03-2018 12:26 AM
I guess we are both data scientists.
Storm comes with Python and Ruby.
The right place to start is src/storm.thrift. Since Storm topologies are just Thrift structures, and Nimbus is a Thrift daemon, you can create and submit topologies in any language. Here's a specification of the protocol: Multilang protocol. The thrift structure lets you define multilang components explicitly as a program and a script, e.g., python and the file implementing your bolt. Multilang uses json messages over stdin/stdout to communicate with the sub-process.
Python supports emitting, anchoring, acking, and logging. Storm "shell" command makes constructing jar and uploading to nimbus easy.
Here is a good reference:
https://docs.microsoft.com/en-us/azure/hdinsight/storm/apache-storm-develop-python-topology
Created 03-03-2018 12:26 AM
I guess we are both data scientists.
Storm comes with Python and Ruby.
The right place to start is src/storm.thrift. Since Storm topologies are just Thrift structures, and Nimbus is a Thrift daemon, you can create and submit topologies in any language. Here's a specification of the protocol: Multilang protocol. The thrift structure lets you define multilang components explicitly as a program and a script, e.g., python and the file implementing your bolt. Multilang uses json messages over stdin/stdout to communicate with the sub-process.
Python supports emitting, anchoring, acking, and logging. Storm "shell" command makes constructing jar and uploading to nimbus easy.
Here is a good reference:
https://docs.microsoft.com/en-us/azure/hdinsight/storm/apache-storm-develop-python-topology