About srowen

srowen · ‎04-29-2015

So the major problem in this thread is that you're trying to manually install Spark from packages. If you do it that way, it takes a lot more work to set up the rest of its env variables and config. You could do that, but you haven't here it seems. CDH already sets up Spark for you, and I imagine it is only making the situation more complex, as the Debian packages are expecting their own custom setup and that's not CDH's. Don't do this. Just use CDH Spark. You're welcome to do what you like but it's not supported. This isn't the place to ask as you're not using CDH's Spark at all.

srowen · ‎04-29-2015

CDH 5.4 = 1.3.0, and this was marked fixed for 1.3.0, so it was included. I'm not sure what you are looking for or referring to.

srowen · ‎04-17-2015

If you print or log to stdout, it goes to the stdout of the executor process, wherever that is running. In YARN-based deployment, you can use "yarn logs ..." to find the executor logs, I believe. Or dig through from the resource manager and find the executor process and its logs from the UI.

srowen · ‎04-13-2015

I suspect it is because the rmr2 integration code is compatible with an older version of HBase than what is shipped in CDH 5.3. The link you cited returns a 404 for me, but, it seems to me that you are in fact using the latest rmr2 and building from source, which is the right thing to do. I have installed rmr2 on CDH 5.2 before. There aren't special versions you need to find. I dug out my notes to myself on how I installed several of these libs before. Maybe they help? For example I installed them differently with R CMD. Of course you may wish to use later and more recent versions of these libraries than what's mentioned in the notes. Basically you just... export HADOOP_CMD=`which hadoop` R ... library(plyrmr) and go to it. HOW TO Copy packages rmr2_3.1.0.tar.gz rhdfs_1.0.8.tar.gz plyrmr_0.2.0.tar.gz to nodes at, say, /tmp. For each node: export HADOOP_CMD=`which hadoop` export HADOOP_STREAMING=`ls /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-streaming-*.jar` As root, install R: yum install R This installs version 3.0.2 on my cluster. Run R to install some dependencies R --vanilla Once in R: install.packages(c("Rcpp", "RJSONIO", "bitops", "digest", "functional", "reshape2", "stringr", "plyr", "caTools", "rJava", "dplyr", "R.methodsS3", "Hmisc")) (choose a mirror that's local when you are prompted) Install packages, back on the command line: R CMD INSTALL /tmp/rmr2_3.1.0.tar.gz R CMD INSTALL /tmp/rhdfs_1.0.8.tar.gz R CMD INSTALL /tmp/plyrmr_0.2.0.tar.gz

srowen · ‎04-10-2015

This is just Java's locking library, it's not specific to the project. This is a lock that supports many readers at one time, but, only one writer at a time (and no readers while a writer has the write lock). You have to acquire the write lock to mutate the shared state, but also need to acquire the read lock to read it -- but, you won't exclude other readers.

srowen · ‎04-09-2015

The Spark, Hadoop and Kafka dependencies are 'provided' by the cluster at runtime and not included in the app. Other dependencies you must bundle with your app. In the case that they conflict with dependencies that leak from Spark you can usually use the user-classpath-first properties to work around them.

srowen · ‎04-08-2015

See https://github.com/OryxProject/oryx/tree/master/deploy/bin now, and also https://github.com/OryxProject/oryx/blob/master/framework/oryx-lambda/src/main/java/com/cloudera/oryx/lambda/AbstractSparkLayer.java and its subclasses.

srowen · ‎04-03-2015

Are you trying to manually set up standalone Master / Workers? You should use CM to do this.

srowen · ‎03-31-2015

It sounds like a network config problem: Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: hadoop02.mycompany.local/192.168.209.172:37271

srowen · ‎03-31-2015

No, there's no way to do this. You could do it quite manually by creating a job that writes new, merged data and then puts it back into place where the old data was.

Online	Offline
Last Visited	‎02-13-2018 12:34 PM

Member Since	‎08-11-2014 09:17 AM
Last Visited	‎02-13-2018 12:34 PM
Posts	481
Kudos received	87

Cloudera Community

Re: Own code editor in CDSW?

Re: error using Pandas within PySpark transformati...

Re: Does CDSW need to be part of the cluster?

Re: Local Data combined with HDFS

Re: Where can I find Oryx 1.x releases (or GitHub)

Re: spark: Exception in thread "main" java.lang.No...

Re: backport for SPARK-5967

Re: Logs from spark executors

Re: how to install RHadoop on CDH5.3

Re: Retrieve and modify latent feature vectors on ...

Re: What dependencies to submit Spark jobs program...

Re: What dependencies to submit Spark jobs program...

Re: Configuring/Adding workers in Spark

Re: Spark (Standalone) error local class incompati...

Re: "Merge" users in Oryx 1?