Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar
Rising Star

Setting Up a Data Science Platform on HDP using Anaconda

16600-dsc001-datascience-platform-on-hdp.png

Building a Data Science Platform using Anaconda needs to be able to

  • Launch PySpark jobs on the cluster
  • Synchronize python libraries from vetted public repositories
  • Isolate environments with specific dependencies to run production jobs using an older version of a package whilst simultaneously running new version of the package
  • Launching notebooks and PySpark jobs using different kernels such as Python_2.7, Python_3.x, R, Scala

Framework of the Data Science Platform

  • Private Repo Server
  • Edge Nodes
    • Dev
    • Test
    • Prod
    • Ansible
    • Git
    • Jenkins

Building blocks of the Data Science Platform

  • Anaconda
  • Ansible
  • Git
  • Jenkins
4,198 Views
Comments
avatar
Explorer

And how to implement this, step how to install? how to install on existing HDP cluster?