I have been asked if I can give a Hadoop Architectures talk at a conference. My problem now is what to include or not. I can cover basic Hadoop without too much trouble - but what about the extra non-Hadoop stuff which is typically included in the ecosystem? What do you think is vital to be mentioned?
I am thinking:
some kind of text database for a searchable front end (eg SolR or ElasticSearch)
Some kind of notebook (eg iPython, Zeppelin)
Spark running on YARN,
Some kind of ingestion (sqoop, spark streaming, flume)
Streaming options (Spark Streaming, Storm, Flink)
If anyone is willing to chat to me about their architecture - and perhaps send me a "back of an envelope" architecture diagram then that would be absolutely great. Comment here or message me.