Support Questions
Find answers, ask questions, and share your expertise

What should go into a Hadoop Architectures talk?


I have been asked if I can give a Hadoop Architectures talk at a conference. My problem now is what to include or not. I can cover basic Hadoop without too much trouble - but what about the extra non-Hadoop stuff which is typically included in the ecosystem? What do you think is vital to be mentioned?

I am thinking:

some kind of text database for a searchable front end (eg SolR or ElasticSearch)

Some kind of notebook (eg iPython, Zeppelin)

Spark running on YARN,

Some kind of ingestion (sqoop, spark streaming, flume)

Streaming options (Spark Streaming, Storm, Flink)

workflow (oozie)

If anyone is willing to chat to me about their architecture - and perhaps send me a "back of an envelope" architecture diagram then that would be absolutely great. Comment here or message me.


Re: What should go into a Hadoop Architectures talk?

Rising Star

Hi Alex,

You can split into two parts :

Data Engineering : HDP data at Rest ( YARN , Sqoop , SolR or ElasticSearch , Spark , TEZ ...)

Real time HDF ( Streaming Storm, Flink, Spark streaming, Flume, NIFI ..)

Data science : Spark MLLIB , Mahout , notebook (eg iPython, Zeppelin) ...