Community Articles
Find and share helpful community-sourced technical articles
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Setting Up a Data Science Platform on HDP using Anaconda


Building a Data Science Platform using Anaconda needs to be able to

  • Launch PySpark jobs on the cluster
  • Synchronize python libraries from vetted public repositories
  • Isolate environments with specific dependencies to run production jobs using an older version of a package whilst simultaneously running new version of the package
  • Launching notebooks and PySpark jobs using different kernels such as Python_2.7, Python_3.x, R, Scala

Framework of the Data Science Platform

  • Private Repo Server
  • Edge Nodes
    • Dev
    • Test
    • Prod
    • Ansible
    • Git
    • Jenkins

Building blocks of the Data Science Platform

  • Anaconda
  • Ansible
  • Git
  • Jenkins
New Contributor

And how to implement this, step how to install? how to install on existing HDP cluster?

Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 12:14 PM
Updated by:
Top Kudoed Authors