Options
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Rising Star
Created on 06-28-2017 08:58 PM - edited 09-16-2022 01:40 AM
Setting Up a Data Science Platform on HDP using Anaconda
Building a Data Science Platform using Anaconda needs to be able to
- Launch PySpark jobs on the cluster
- Synchronize python libraries from vetted public repositories
- Isolate environments with specific dependencies to run production jobs using an older version of a package whilst simultaneously running new version of the package
- Launching notebooks and PySpark jobs using different kernels such as Python_2.7, Python_3.x, R, Scala
Framework of the Data Science Platform
- Private Repo Server
- Edge Nodes
- Dev
- Test
- Prod
- Ansible
- Git
- Jenkins
Building blocks of the Data Science Platform
- Anaconda
- Ansible
- Git
- Jenkins
4,559 Views
Comments
Explorer
Created on 12-15-2017 08:33 AM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
And how to implement this, step how to install? how to install on existing HDP cluster?