Member since
04-28-2017
41
Posts
14
Kudos Received
11
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3348 | 09-05-2018 07:40 PM | |
6292 | 02-15-2018 08:08 PM | |
4855 | 11-21-2017 09:54 AM | |
3956 | 11-13-2017 11:52 AM | |
15351 | 07-25-2017 10:42 AM |
09-05-2018
07:40 PM
2 Kudos
You can configure engine CPU and memory configurations in the Admin > Engines tab. However CDSW does not currently have a quota system, akin to YARN resource pools. This is a roadmap item we are actively working on. If you have specific requirements, please let us know. Tristan
... View more
04-04-2018
11:29 AM
It sounds like requests is not installed on your executors. You could manually install these libraries on all executors or ship it using Spark following the techniques outlined in this blog post: https://blog.cloudera.com/blog/2017/04/use-your-favorite-python-library-on-pyspark-cluster-with-cloudera-data-science-workbench/ . Tristan
... View more
02-15-2018
08:29 PM
2 Kudos
The base image does include the following system packages: unixodbc-bin libsqliteodbc odbc-postgresql tdsodbc To customize the base image further, you'd need to contact your CDSW administrator and request a package or configuration be added to the engine. JDBC is often easier to configure without root access, if that is an option for your use case. Tristan
... View more
02-15-2018
08:08 PM
1 Kudo
The current best approach to install system packages is to use the custom engine functionality. This does require administration access however. I have also filed a feature request internally to package the Impala ODBC drivers and ensure that the base ODBC drivers are part of the standard base image. Thanks, Tristan
... View more
11-21-2017
09:54 AM
1 Kudo
The Cloudera Data Science Workbench requires Cloudera Manager 5.13+. The error you are seeing is due to incompatibilities with earlier version of Cloudera Manager. Please try upgrading Cloudera Manager and let us know if that doesn't resolve the issue. Best, Tristan
... View more
11-13-2017
11:52 AM
1 Kudo
Cloudera Data Science Workbench only supports Spark 2.x, it doesn't support the Spark 1.x line. Spark 2 has a number of important improvements for data science workloads which is why we have focused on Spark 2 support only. You can still use Spark 1.x on the same cluster, but only Spark 2 within CDSW. Best, Tristan
... View more
11-06-2017
12:20 PM
Hi Rob, Could you give a minimal example of a script that fails with this error? I'm specifically interested in where the import of numpy is being done. Thanks, Tristan
... View more
07-25-2017
12:38 PM
2 Kudos
This depends on your corporate network configuration. Private zones are only resolvable within your AWS VPC, so unless you have your corporate network connected to your VPC and DNS peering properly configured, it is often easier to configure DNS in your public zone. You will want to set both cdsw.datacloudera.com and *.cdsw.datacloudera.com A name records to your master node. Within AWS, it's often more flexible to point this records to an EIP so you can attach it to different nodes over time. Tristan
... View more
07-25-2017
10:42 AM
1 Kudo
Sorry for the delay providing additional details. The wildcard DNS needs to work on both the CDSW nodes and on your computer. For a production installation, you should configure a proper wildcard DNS entry using a domain you control. You need to control a domain, e.g. company.com, and have proper nameservers (e.g. your internal corporate DNS or something like Route53, GoDaddy, etc). For instance, if you control the domain company.com and the nameservers point to AWS Route 53, then you should configure the wildcard DNS entries as described in the documention within Route 53. This will resolve the wildcard entries to your master CDSW node both within AWS and on your computer. If you're in a corporate environment, typically these types of tasks would be done by a network administrator. If you are only testing CDSW and do not control a domain, the easiest way is to use a service like xip.io which provides a wildcard DNS automatically pointing to an IP. However this setup should not be used in production since it is unreliable and delegates your DNS configuration to a third party. I hope this is helpful. Best, Tristan
... View more
07-10-2017
02:56 PM
In another window you can check the status with "kubectl get pods". The installation process downloads approximately 5GB of image data at this point, so it may take some amount of time if your internet connection is slow. If your proxy is misconfigured you may run into issues downloading specific images. You can test your proxy configuration by attempting to pull an image manually, as previously mentioned. Let us know if you continue to have issues. Tristan
... View more