About tristanzajonc

tristanzajonc · ‎09-05-2018

You can configure engine CPU and memory configurations in the Admin > Engines tab. However CDSW does not currently have a quota system, akin to YARN resource pools. This is a roadmap item we are actively working on. If you have specific requirements, please let us know. Tristan

tristanzajonc · ‎04-04-2018

It sounds like requests is not installed on your executors. You could manually install these libraries on all executors or ship it using Spark following the techniques outlined in this blog post: https://blog.cloudera.com/blog/2017/04/use-your-favorite-python-library-on-pyspark-cluster-with-cloudera-data-science-workbench/ . Tristan

tristanzajonc · ‎02-15-2018

The base image does include the following system packages: unixodbc-bin libsqliteodbc odbc-postgresql tdsodbc To customize the base image further, you'd need to contact your CDSW administrator and request a package or configuration be added to the engine. JDBC is often easier to configure without root access, if that is an option for your use case. Tristan

tristanzajonc · ‎02-15-2018

The current best approach to install system packages is to use the custom engine functionality. This does require administration access however. I have also filed a feature request internally to package the Impala ODBC drivers and ensure that the base ODBC drivers are part of the standard base image. Thanks, Tristan

tristanzajonc · ‎11-21-2017

The Cloudera Data Science Workbench requires Cloudera Manager 5.13+. The error you are seeing is due to incompatibilities with earlier version of Cloudera Manager. Please try upgrading Cloudera Manager and let us know if that doesn't resolve the issue. Best, Tristan

tristanzajonc · ‎11-13-2017

Cloudera Data Science Workbench only supports Spark 2.x, it doesn't support the Spark 1.x line. Spark 2 has a number of important improvements for data science workloads which is why we have focused on Spark 2 support only. You can still use Spark 1.x on the same cluster, but only Spark 2 within CDSW. Best, Tristan

tristanzajonc · ‎11-06-2017

Hi Rob, Could you give a minimal example of a script that fails with this error? I'm specifically interested in where the import of numpy is being done. Thanks, Tristan

tristanzajonc · ‎07-25-2017

This depends on your corporate network configuration. Private zones are only resolvable within your AWS VPC, so unless you have your corporate network connected to your VPC and DNS peering properly configured, it is often easier to configure DNS in your public zone. You will want to set both cdsw.datacloudera.com and *.cdsw.datacloudera.com A name records to your master node. Within AWS, it's often more flexible to point this records to an EIP so you can attach it to different nodes over time. Tristan

tristanzajonc · ‎07-25-2017

Sorry for the delay providing additional details. The wildcard DNS needs to work on both the CDSW nodes and on your computer. For a production installation, you should configure a proper wildcard DNS entry using a domain you control. You need to control a domain, e.g. company.com, and have proper nameservers (e.g. your internal corporate DNS or something like Route53, GoDaddy, etc). For instance, if you control the domain company.com and the nameservers point to AWS Route 53, then you should configure the wildcard DNS entries as described in the documention within Route 53. This will resolve the wildcard entries to your master CDSW node both within AWS and on your computer. If you're in a corporate environment, typically these types of tasks would be done by a network administrator. If you are only testing CDSW and do not control a domain, the easiest way is to use a service like xip.io which provides a wildcard DNS automatically pointing to an IP. However this setup should not be used in production since it is unreliable and delegates your DNS configuration to a third party. I hope this is helpful. Best, Tristan

tristanzajonc · ‎07-10-2017

In another window you can check the status with "kubectl get pods". The installation process downloads approximately 5GB of image data at this point, so it may take some amount of time if your internet connection is slow. If your proxy is misconfigured you may run into issues downloading specific images. You can test your proxy configuration by attempting to pull an image manually, as previously mentioned. Let us know if you continue to have issues. Tristan

Online	Offline
Last Visited	‎03-23-2019 03:03 PM

Member Since	‎04-28-2017 01:40 PM
Last Visited	‎03-23-2019 03:03 PM
Posts	41
Kudos received	14

Cloudera Community

Re: limit the resource for CDSW users

Re: Impala ODBC drivers in base image?

Re: Problem for adding cdsw service

Re: CDS Workbench with Spark Versions Below 2.0

Re: How to setup wildcard DNS subdomain

Re: limit the resource for CDSW users

Re: CDSW Error: No module named numpy???

Re: Impala ODBC drivers in base image?

Re: Impala ODBC drivers in base image?

Re: Problem for adding cdsw service

Re: CDS Workbench with Spark Versions Below 2.0

Re: CDSW Error: No module named numpy???

Re: How to setup wildcard DNS subdomain

Re: How to setup wildcard DNS subdomain

Re: cdsw init failed