Cloudera Community

Community Articles

Find and share helpful community-sourced technical articles.

Advanced Search

Rising Star

Setting Up a Data Science Platform on HDP using Anaconda

Building a Data Science Platform using Anaconda needs to be able to

Launch PySpark jobs on the cluster
Synchronize python libraries from vetted public repositories
Isolate environments with specific dependencies to run production jobs using an older version of a package whilst simultaneously running new version of the package
Launching notebooks and PySpark jobs using different kernels such as Python_2.7, Python_3.x, R, Scala

Framework of the Data Science Platform

Private Repo Server
Edge Nodes
- Dev
- Test
- Prod
- Ansible
- Git
- Jenkins

Building blocks of the Data Science Platform

Anaconda
Ansible
Git
Jenkins

5,211 Views

Comments

New Member

And how to implement this, step how to install? how to install on existing HDP cluster?

Announcements

Community Announcements

June 2026 Community Highlights

What's New @ Cloudera

Cloudera Data Lineage Custom Lineage Connector Relaunch

What's New @ Cloudera

Product Update: Cloudera Flow Management Operator for Kubern...

What's New @ Cloudera

Product Update: Cloudera Data Flow v3.1 for Cloudera on Clou...

Community Announcements

May 2026 Community Highlights

Top Kudoed Authors

User

Count

6

4

2

1

1