Importance of Pydoop in BigData Analytics and Data Science


I am new to Data Science and Big Data Frameworks. Lets say,I have a DataSet input in CSV. What I found from Google and other resources about a Data Analyst and Data Scientist daily job, Once user gets DataSet, first will manipulate with help of python pandas library which includes Data cleaning and other stuffs. Then User visualizes the datas using matplotlib and other techniques. User can write Machine Learning algorithms to get a prediction for some criterias. All the above workflows can be summarized into data analysis and prediction. Now, on the other account, I found out Pydoop(a Hadoop framework of Python) to do operations like Storage, processing etc I am bit confused, in the Data Analysis workflow mentioned above where pydoop stands exactly in that? Please guide me.


Hello @nicole wells

