Member since
08-11-2014
481
Posts
92
Kudos Received
72
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3444 | 01-26-2018 04:02 AM | |
7087 | 12-22-2017 09:18 AM | |
3538 | 12-05-2017 06:13 AM | |
3855 | 10-16-2017 07:55 AM | |
11223 | 10-04-2017 08:08 PM |
04-29-2015
08:57 AM
So the major problem in this thread is that you're trying to manually install Spark from packages. If you do it that way, it takes a lot more work to set up the rest of its env variables and config. You could do that, but you haven't here it seems. CDH already sets up Spark for you, and I imagine it is only making the situation more complex, as the Debian packages are expecting their own custom setup and that's not CDH's. Don't do this. Just use CDH Spark. You're welcome to do what you like but it's not supported. This isn't the place to ask as you're not using CDH's Spark at all.
... View more
04-29-2015
08:41 AM
CDH 5.4 = 1.3.0, and this was marked fixed for 1.3.0, so it was included. I'm not sure what you are looking for or referring to.
... View more
04-17-2015
10:05 PM
If you print or log to stdout, it goes to the stdout of the executor process, wherever that is running. In YARN-based deployment, you can use "yarn logs ..." to find the executor logs, I believe. Or dig through from the resource manager and find the executor process and its logs from the UI.
... View more
04-13-2015
04:19 PM
I suspect it is because the rmr2 integration code is compatible with an older version of HBase than what is shipped in CDH 5.3. The link you cited returns a 404 for me, but, it seems to me that you are in fact using the latest rmr2 and building from source, which is the right thing to do. I have installed rmr2 on CDH 5.2 before. There aren't special versions you need to find. I dug out my notes to myself on how I installed several of these libs before. Maybe they help? For example I installed them differently with R CMD. Of course you may wish to use later and more recent versions of these libraries than what's mentioned in the notes. Basically you just...
export HADOOP_CMD=`which hadoop`
R
...
library(plyrmr)
and go to it.
HOW TO
Copy packages rmr2_3.1.0.tar.gz rhdfs_1.0.8.tar.gz plyrmr_0.2.0.tar.gz
to nodes at, say, /tmp.
For each node:
export HADOOP_CMD=`which hadoop`
export HADOOP_STREAMING=`ls
/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-streaming-*.jar`
As root, install R:
yum install R
This installs version 3.0.2 on my cluster. Run R to install some dependencies
R --vanilla
Once in R:
install.packages(c("Rcpp", "RJSONIO", "bitops", "digest",
"functional", "reshape2", "stringr", "plyr", "caTools", "rJava",
"dplyr", "R.methodsS3", "Hmisc"))
(choose a mirror that's local when you are prompted)
Install packages, back on the command line:
R CMD INSTALL /tmp/rmr2_3.1.0.tar.gz
R CMD INSTALL /tmp/rhdfs_1.0.8.tar.gz
R CMD INSTALL /tmp/plyrmr_0.2.0.tar.gz
... View more
04-10-2015
08:54 AM
This is just Java's locking library, it's not specific to the project. This is a lock that supports many readers at one time, but, only one writer at a time (and no readers while a writer has the write lock). You have to acquire the write lock to mutate the shared state, but also need to acquire the read lock to read it -- but, you won't exclude other readers.
... View more
04-09-2015
03:41 AM
The Spark, Hadoop and Kafka dependencies are 'provided' by the cluster at runtime and not included in the app. Other dependencies you must bundle with your app. In the case that they conflict with dependencies that leak from Spark you can usually use the user-classpath-first properties to work around them.
... View more
04-08-2015
01:58 AM
1 Kudo
See https://github.com/OryxProject/oryx/tree/master/deploy/bin now, and also https://github.com/OryxProject/oryx/blob/master/framework/oryx-lambda/src/main/java/com/cloudera/oryx/lambda/AbstractSparkLayer.java and its subclasses.
... View more
04-03-2015
07:06 AM
Are you trying to manually set up standalone Master / Workers? You should use CM to do this.
... View more
03-31-2015
04:03 AM
It sounds like a network config problem: Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: hadoop02.mycompany.local/192.168.209.172:37271
... View more
03-31-2015
02:12 AM
1 Kudo
No, there's no way to do this. You could do it quite manually by creating a job that writes new, merged data and then puts it back into place where the old data was.
... View more