About srowen

srowen · ‎04-25-2014

Meaning, multi-layer neural nets? No, no plans for that. The idea is to support common business use cases, and there is not a lot of broad use for this technique as a classifier over, say, RDF.

srowen · ‎04-14-2014

(By the way, there is a separate forum for Spark: http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/bd-p/Spark ) If the error is definitely in the shell / REPL, then I believe you just want to set SPARK_REPL_OPTS: SPARK_REPL_OPTS="-XX:MaxPermSize=256m" spark-shell I find this other setting can help with permgen usage in Scala, which is an issue when running tests too: SPARK_REPL_OPTS="-XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=256m" spark-shell

srowen · ‎04-14-2014

I would bet that this means that the amount of memory you have requested for your executors exceeds the amount of memory that any single worker of yours has. What are these sizes?

srowen · ‎04-12-2014

No, sbt and scala are not installed. SBT is a build tool and used at development time, rather than at runtime -- it's a lot like Maven in this respect. You would use SBT (or Maven) on your development machine to create a .jar file, and run that on your cluster.

srowen · ‎04-02-2014

I believe that means you've requested more memory for a task than any worker has available, but people more knowledgeable might be able to confirm or deny that.

srowen · ‎03-25-2014

Are you on CDH5 beta 2? It already includes Spark. I wonder if its setup of Spark is interfering with whatever you have installed separately, or vice versa. Can you simply use the built-in deployment? It would be easier.

srowen · ‎03-25-2014

The error indicates that mismatching versions of Hive are being used. Not sure that helps. I am not familiar with Shark as a user myself. If I'm not wrong, this isn't specific to the Cloudera distribution of Spark, so you may get better answers asking on the general Shark list.

srowen · ‎03-25-2014

It should be possible to use the Hive client to access Shark. https://cwiki.apache.org/confluence/display/Hive/HiveClient I have not tried it myself, so maybe others can weigh in with better info, like which version of Hive is used with 0.9. I think it is Hive 0.11, from looking at the build.

srowen · ‎03-24-2014

At the moment, CDH5b2 deploys Spark in "standalone" mode: https://spark.apache.org/docs/0.9.0/spark-standalone.html This simply means Spark tries to manage resources itself, rather than participating in a cluster manager like YARN or Mesos. As an end user, it shouldn't make much difference to you at all. Just fire up the shell and go. Once a few kinks are worked out, Spark's YARN integration will be used in the future, as I understand.

srowen · ‎03-11-2014

No they do not. The serving layer must be able to access HDFS. It takes its config by default from Hadoop config on the local machine at /etc/hadoop/conf. If that is hard I can say more about how to directly specify the HDFS URL. For this reason you can run these on many different machines.

Online	Offline
Last Visited	‎02-06-2015 02:06 PM

Member Since	‎07-29-2013 08:58 AM
Last Visited	‎02-06-2015 02:06 PM
Posts	366
Kudos received	62

Cloudera Community

Re: CDH 5.6

Re: How to use Oryx 1 to detect spam email

Re: Spark program in eclipse

Re: Graphx in latest CDH

Re: Maturity ORYX

Re: Does oryx support deep learning?

Re: spark-shell java.lang.OutOfMemoryError: PermGe...

Re: TaskSchedulerImpl: Initial job has not accepte...

Re: Spark sbt tool

Re: Spark on YARN in CDH-5

Re: Spark on YARN in CDH-5

Re: Shark connectivity using Java and JDBC/other d...

Re: Shark connectivity using Java and JDBC/other d...

Re: Spark on YARN in CDH-5

Re: Computation and Serving layer instances