Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Importance of Pydoop in BigData Analytics and Data Science

Importance of Pydoop in BigData Analytics and Data Science

Explorer

I am new to Data Science and Big Data Frameworks. Lets say,I have a DataSet input in CSV. What I found from Google and other resources about a Data Analyst and Data Scientist daily job, Once user gets DataSet, first will manipulate with help of python pandas library which includes Data cleaning and other stuffs. Then User visualizes the datas using matplotlib and other techniques. User can write Machine Learning algorithms to get a prediction for some criterias. All the above workflows can be summarized into data analysis and prediction. Now, on the other account, I found out Pydoop(a Hadoop framework of Python) to do operations like Storage, processing etc I am bit confused, in the Data Analysis workflow mentioned above where pydoop stands exactly in that? Please guide me.

1 REPLY 1
Highlighted

Re: Importance of Pydoop in BigData Analytics and Data Science

Super Collaborator

Hello @nicole wells

Please find the question that you copied (and my answer) here

https://stackoverflow.com/a/51687883/2308683

Don't have an account?
Coming from Hortonworks? Activate your account here