Member since
08-11-2014
481
Posts
92
Kudos Received
72
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3030 | 01-26-2018 04:02 AM | |
6379 | 12-22-2017 09:18 AM | |
3063 | 12-05-2017 06:13 AM | |
3321 | 10-16-2017 07:55 AM | |
9501 | 10-04-2017 08:08 PM |
02-16-2017
05:50 AM
Yes you can use Spark and all of the services in the VM. None depend on Cloudera Manager at all.
... View more
01-15-2017
10:13 AM
No, you definitely do not want to take this dir away from hdfs! in general I'd never mess with the HDFS permissions for key dirs like this. Instead, hdfs needs to make a directory for your user. This kind of stuff happens automatically via Hue.
... View more
01-15-2017
02:34 AM
1 Kudo
That's the general error you get when you run as user foo, but you haven't set up /user/foo in HDFS, and the usual way that is done is through Hue or syncing with something like Active Directory.
... View more
01-06-2017
03:26 AM
Generally, you won't be able to run R on your laptop/workstation and connect it remotely to the cluster. It's possible, but would require more setup and configuration, so I would avoid this deployment for now. Instead, run R on a cluster gateway node. You are using a standalone master, which isn't supported anyway. You would want to use YARN. Although you should be able to use your own copy of SparkR 1.6 with the cluster, I don't know if it works. It's not supported. sparklyr is another option, which at least is supported by RStudio.
... View more
01-04-2017
04:30 AM
Generally speaking, you will need to have connectivity from your laptop to at least one machine in the cluster (the gateway), and have some local configuration for sparklyr that indicates where the cluster is. I haven't tried this with sparklyr, but for other R-Hadoop libraries like rhdfs, it means having a copy of the HADOOP_CONF_DIR files from the cluster locally. It also means you probably need the same version of Spark binaries locally as are on the cluster. This is challenging. You may be better off running sparklyr directly on the edge/gateway node of the cluster. See https://blog.cloudera.com/blog/2016/09/introducing-sparklyr-an-r-interface-for-apache-spark/ Instead of installing Spark, point it to a non-local master like "yarn-client" to use the cluster. SparkR is also something you can try to get working. You would probably need to use an upstream sparkr version that's similar to the CDH Spark you're using (1.x vs 2.x) and then just try to run a ./bin/sparkr from its distirbution. Standalone mode isn't supported. None of these (sparkr, sparklyr) are supported by Cloudera, and so have no relationship to CM. You should not modify your existing Spark service and shouldn't have to.
... View more
01-02-2017
05:27 AM
It sounds like you only finished step 2. You need to finish all of them.
... View more
01-02-2017
01:06 AM
There is no CDH 5.12. Spark 2 is availabled as a CSD. Please follow the documented steps for installing it, which never includes manually copying JARs. http://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html
... View more
12-16-2016
09:08 AM
It won't be terribly different -- like a maintenance release generally contains a small number of fixes -- but yes you will want to update it in general. You will need the GA version if you want production support, too.
... View more
11-16-2016
03:09 AM
No, the repo I'm referencing is the one single Cloudera repo where all artifacts are hosted.
... View more
11-16-2016
02:36 AM
1 Kudo
https://github.com/OryxProject/oryx/tree/master/deploy/bin You should in general look at https://github.com/OryxProject/oryx
... View more