Member since
07-29-2013
366
Posts
69
Kudos Received
71
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4976 | 03-09-2016 01:21 AM | |
4245 | 03-07-2016 01:52 AM | |
13347 | 02-29-2016 04:40 AM | |
3966 | 02-22-2016 03:08 PM | |
4951 | 01-19-2016 02:13 PM |
04-25-2014
06:15 AM
Meaning, multi-layer neural nets? No, no plans for that. The idea is to support common business use cases, and there is not a lot of broad use for this technique as a classifier over, say, RDF.
... View more
04-14-2014
11:47 PM
(By the way, there is a separate forum for Spark: http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/bd-p/Spark ) If the error is definitely in the shell / REPL, then I believe you just want to set SPARK_REPL_OPTS: SPARK_REPL_OPTS="-XX:MaxPermSize=256m" spark-shell I find this other setting can help with permgen usage in Scala, which is an issue when running tests too: SPARK_REPL_OPTS="-XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=256m" spark-shell
... View more
04-14-2014
12:03 AM
2 Kudos
I would bet that this means that the amount of memory you have requested for your executors exceeds the amount of memory that any single worker of yours has. What are these sizes?
... View more
04-12-2014
07:37 AM
No, sbt and scala are not installed. SBT is a build tool and used at development time, rather than at runtime -- it's a lot like Maven in this respect. You would use SBT (or Maven) on your development machine to create a .jar file, and run that on your cluster.
... View more
04-02-2014
10:00 AM
I believe that means you've requested more memory for a task than any worker has available, but people more knowledgeable might be able to confirm or deny that.
... View more
03-25-2014
04:29 AM
1 Kudo
Are you on CDH5 beta 2? It already includes Spark. I wonder if its setup of Spark is interfering with whatever you have installed separately, or vice versa. Can you simply use the built-in deployment? It would be easier.
... View more
03-25-2014
03:34 AM
The error indicates that mismatching versions of Hive are being used. Not sure that helps. I am not familiar with Shark as a user myself. If I'm not wrong, this isn't specific to the Cloudera distribution of Spark, so you may get better answers asking on the general Shark list.
... View more
03-25-2014
02:22 AM
It should be possible to use the Hive client to access Shark. https://cwiki.apache.org/confluence/display/Hive/HiveClient I have not tried it myself, so maybe others can weigh in with better info, like which version of Hive is used with 0.9. I think it is Hive 0.11, from looking at the build.
... View more
03-24-2014
06:51 AM
1 Kudo
At the moment, CDH5b2 deploys Spark in "standalone" mode: https://spark.apache.org/docs/0.9.0/spark-standalone.html This simply means Spark tries to manage resources itself, rather than participating in a cluster manager like YARN or Mesos. As an end user, it shouldn't make much difference to you at all. Just fire up the shell and go. Once a few kinks are worked out, Spark's YARN integration will be used in the future, as I understand.
... View more
03-11-2014
05:54 AM
1 Kudo
No they do not. The serving layer must be able to access HDFS. It takes its config by default from Hadoop config on the local machine at /etc/hadoop/conf. If that is hard I can say more about how to directly specify the HDFS URL. For this reason you can run these on many different machines.
... View more