Support Questions

Find answers, ask questions, and share your expertise

Hadoop, Hive, Spark, what's going onnnn?

New Contributor

I started a new job and I will soon be working with Hive and Pyspark to pull from the company's big data lake. I have lots of experience with Python and SQL but not much with big data systems. Can anyone recommend any good books to help a data scientist understand how to work with Hadoop systems? Extra helpful if they go into detail on Hive and Spark



If you have access to O'Reilly, below Hive book is useful. It was published in 2018, fairly new compare with others:

Regarding Spark, below book published this year is good:

Hope that helps.