Our users (data scientists) want the flexibility to create their own client environments on edge nodes. For example: Data Scientist 1 may want R-Studio, SparkR, H20 and Data Scientist 2 may want Anaconda, PySpark and H20 with different versions. We're thinking of using Docker for containerization wherein there would be multiple Docker containers running, each acting as an individual virtual edge node. Each container would connect to the HDP cluster to run jobs. Does anyone have any experience in this space? If so, please share best practices. Thanks.
... View more