Member since
04-28-2017
41
Posts
14
Kudos Received
11
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2395 | 09-05-2018 07:40 PM | |
4878 | 02-15-2018 08:08 PM | |
3761 | 11-21-2017 09:54 AM | |
2984 | 11-13-2017 11:52 AM | |
10105 | 07-25-2017 10:42 AM |
03-23-2019
11:42 AM
Are you putting arguments in the arguments field of the UI or after your script name? I don't see an immediate issue with what you're trying to do.
... View more
11-01-2018
09:45 PM
1 Kudo
That in general should not be a problem. You can have many DNS names pointing to the same IP. CDSW does need both, so that it can serve the root domain as well. Tristan
... View more
09-05-2018
07:40 PM
2 Kudos
You can configure engine CPU and memory configurations in the Admin > Engines tab. However CDSW does not currently have a quota system, akin to YARN resource pools. This is a roadmap item we are actively working on. If you have specific requirements, please let us know. Tristan
... View more
05-19-2018
10:39 AM
It appears that neither node is labeled as the master node (Stateful=True) which could indicate that you have an incomplete initialization process. I would also start with one node for simplicity and then add the second node once you get that working. Try cdsw reset on worker node and then cdsw reset on master node. After that, try cdsw init and make sure to allow the process to complete. Tristan
... View more
04-04-2018
11:29 AM
It sounds like requests is not installed on your executors. You could manually install these libraries on all executors or ship it using Spark following the techniques outlined in this blog post: https://blog.cloudera.com/blog/2017/04/use-your-favorite-python-library-on-pyspark-cluster-with-cloudera-data-science-workbench/ . Tristan
... View more
02-21-2018
04:09 PM
Environmental variables are currently set at runtime. You can override the defaults global in Admin > Engine panel or within a project in Settings > Environment. Best, Tristan
... View more
02-15-2018
08:29 PM
2 Kudos
The base image does include the following system packages: unixodbc-bin libsqliteodbc odbc-postgresql tdsodbc To customize the base image further, you'd need to contact your CDSW administrator and request a package or configuration be added to the engine. JDBC is often easier to configure without root access, if that is an option for your use case. Tristan
... View more
02-15-2018
08:08 PM
1 Kudo
The current best approach to install system packages is to use the custom engine functionality. This does require administration access however. I have also filed a feature request internally to package the Impala ODBC drivers and ensure that the base ODBC drivers are part of the standard base image. Thanks, Tristan
... View more
11-22-2017
01:07 PM
A few questions: 1. Are you using a HTTP Proxy? 2. Does localhost resolve to 127.0.0.1? 3. Do you have multi-homed network configuration? There are a few potential causes for an initialization hang. Tristan
... View more
11-21-2017
09:54 AM
1 Kudo
The Cloudera Data Science Workbench requires Cloudera Manager 5.13+. The error you are seeing is due to incompatibilities with earlier version of Cloudera Manager. Please try upgrading Cloudera Manager and let us know if that doesn't resolve the issue. Best, Tristan
... View more
11-13-2017
11:52 AM
1 Kudo
Cloudera Data Science Workbench only supports Spark 2.x, it doesn't support the Spark 1.x line. Spark 2 has a number of important improvements for data science workloads which is why we have focused on Spark 2 support only. You can still use Spark 1.x on the same cluster, but only Spark 2 within CDSW. Best, Tristan
... View more
11-06-2017
12:20 PM
Hi Rob, Could you give a minimal example of a script that fails with this error? I'm specifically interested in where the import of numpy is being done. Thanks, Tristan
... View more
07-25-2017
12:38 PM
2 Kudos
This depends on your corporate network configuration. Private zones are only resolvable within your AWS VPC, so unless you have your corporate network connected to your VPC and DNS peering properly configured, it is often easier to configure DNS in your public zone. You will want to set both cdsw.datacloudera.com and *.cdsw.datacloudera.com A name records to your master node. Within AWS, it's often more flexible to point this records to an EIP so you can attach it to different nodes over time. Tristan
... View more
07-25-2017
12:30 PM
Thanks for your question. Standalone R and Python jobs run only on the CDSW edge nodes where we have more control over dependency management using Docker. However these jobs can push workloads into the cluster using tools like PySpark, Sparklyr, Impala, and Hive. This allows you to get full dependency management for R and Python in the edge environment while still scaling specific workloads into the cluster. There is not currently a way to run the R and Python jobs themselves under YARN. In terms of SparkR, we recommend, but do not directly support, Sparklyr instead of SparkR. I hope that is helpful. Tristan
... View more
07-25-2017
10:42 AM
1 Kudo
Sorry for the delay providing additional details. The wildcard DNS needs to work on both the CDSW nodes and on your computer. For a production installation, you should configure a proper wildcard DNS entry using a domain you control. You need to control a domain, e.g. company.com, and have proper nameservers (e.g. your internal corporate DNS or something like Route53, GoDaddy, etc). For instance, if you control the domain company.com and the nameservers point to AWS Route 53, then you should configure the wildcard DNS entries as described in the documention within Route 53. This will resolve the wildcard entries to your master CDSW node both within AWS and on your computer. If you're in a corporate environment, typically these types of tasks would be done by a network administrator. If you are only testing CDSW and do not control a domain, the easiest way is to use a service like xip.io which provides a wildcard DNS automatically pointing to an IP. However this setup should not be used in production since it is unreliable and delegates your DNS configuration to a third party. I hope this is helpful. Best, Tristan
... View more
07-10-2017
02:56 PM
In another window you can check the status with "kubectl get pods". The installation process downloads approximately 5GB of image data at this point, so it may take some amount of time if your internet connection is slow. If your proxy is misconfigured you may run into issues downloading specific images. You can test your proxy configuration by attempting to pull an image manually, as previously mentioned. Let us know if you continue to have issues. Tristan
... View more
07-10-2017
12:17 PM
MSharma, Can you pull the image manually? docker pull gcr.io/google_containers/pause-amd64:3.0 There should be no need for authentication. You are likely facing some proxy misconfiguration or certificate validation error. Best, Tristan
... View more
07-10-2017
11:08 AM
Does your node have internet access or a properly configured HTTP(S)_PROXY? This error can occur when Docker cannot download images from Cloudera's Docker registry. You may see additional information using "systemctl status docker" or "journalctl -u docker". Please let us know if you see additional errors or changing your configuration resolves the issue. Tristan
... View more
07-10-2017
11:00 AM
Could you please give the output for: kubectl get events kubectl logs <stuck-pod-id> engine Tristan
... View more
07-06-2017
10:27 AM
The analyis.py file is meant to be run within the CDSW console, not directly from the terminal. See the "Getting Started" guide within the CDSW documentation. Within CDSW, Python consoles are backed by Jupyter kernels, which have the necessary configuration to create plots. Best, Tristan
... View more
07-06-2017
10:24 AM
Can you also show the output of cdsw status to ensure the database is running? If the database is running and the web pods can still not access the database this is sometimes a sign of existing iptables rules that conflict with the networking setup within CDSW. Does your system have any custom iptables rules? For instance, a NOTRACK rule can break virtual ip functionality used by Kubernetes services. Thanks, Tristan
... View more
07-06-2017
10:20 AM
Hi, Sparklyr is supported by RStudio, so it may be better to ask this question directory to RStudio or in a forum like StackOverflow. However looking at the code it appears you are passing a string to copy_to rather than a dataframe. If assetstatuses is a dataframe that is available, you can try copying it with copy_to(sc, assetstatuses) without quotes around assetstatuses. See: http://spark.rstudio.com/reference/sparklyr/latest/copy_to.html Best, Tristan
... View more
07-06-2017
10:15 AM
1 Kudo
The <none> indicator is not an issue -- it simply indicates that those nodes are worker nodes and don't have stateful information stored on them. Hanging engines on "ContainerCreating" typically means you have not run "cdsw enable <worker-ip>" on the master node for all your worker nodes. This whitelists the IP of your worker nodes for NFS mounts. If you have not done this, containers can hang waiting for the project mounts to become available when scheduled onto a worker node. Please let me know if running "cdsw enable" for each worker IP resolves this issue. Thanks, Tristan
... View more
07-06-2017
10:11 AM
You need to configure a wildcard DNS entry to use CDSW. See the documentation here: https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_install.html#set_up_wildcard_dns While some features will work without a wildcard, running engines, accessing the terminal, or viewing the Spark UI will not. Because these URLs are random, you cannot simply add them to your local hosts file. Thanks, Tristan
... View more
07-06-2017
10:04 AM
1 Kudo
Hi David, This should be the IP address of the master CDSW node. We will work to clarify this in the documentation. Thanks, Tristan
... View more
06-15-2017
08:17 PM
Hi Pollylaw, Have you set JAVA_HOME in your the Administration settings? There is a known issue in 1.0.1 with custom JAVA_HOME directories that can lead to this error message. A workaround is to remove JAVA_HOME from Admin > Engines and make sure Java is located at a location that can be detected by the bigtop-detect-javahome.sh script. This will be fixed in a future release. Please let us know if this workaround does not solve your issue or if this explanation does not match your configuration. Tristan
... View more
06-14-2017
03:01 PM
Do you have LDAP or SAML enabled? What version of CDSW are you running?
... View more
06-14-2017
03:00 PM
For real installations, you should pull from a repository. This will ensure all the nodes in your CDSW cluster have access to the image, not just the node where you built it. Moreover, you should not assume that your Docker image store is persistent across upgrades or in long-running clusters where we may evict less used images to free space. By pushing your custom images to a repository, you will ensure that images are never deleted due to image eviction policies or other administration tasks.
... View more
06-14-2017
02:56 PM
If you have set the SMTP server for email notifications, you can use the "forgot password" link on the Sign In page. If you have not configured SMTP server and you have lost your admin password, you will need to either reinstall Cloudera Data Science Workbench or contact Cloudera Support for a workaround.
... View more
06-12-2017
08:44 PM
It appears your installation was interrupted. Please try `cdsw reset` followed by `cdsw init`.
... View more