Member since
08-11-2014
481
Posts
92
Kudos Received
72
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3010 | 01-26-2018 04:02 AM | |
6347 | 12-22-2017 09:18 AM | |
3044 | 12-05-2017 06:13 AM | |
3305 | 10-16-2017 07:55 AM | |
9430 | 10-04-2017 08:08 PM |
03-04-2015
03:43 PM
There is no absolute minimum. I would imagine that, for reasoanbly sized problems, you'd want to allocate 1GB of memory to each of these processes -- meaning at least 1 executor for both batch and streaming with 1GB of memory, and at least 1 and probably more cores. The serving layer should probably have 1GB+ of memory too and will use as many cores as the machine has. This is designed for a cluster of machines, I suppose, but nothing prevents you from running everything on one machine.
... View more
03-02-2015
04:48 AM
1 Kudo
This means that your YARN cluster does not have the amount of resources (CPU, memory) available that the app is asking for. How much is available? The default is pretty modest though; 2 executors, and 1g RAM and 8 cores per executor. Maybe that's too many cores to ask for? I could turn down the default. You can change it in your config file. oryx = { ... speed = { streaming = { ... # Number of executors to start num-executors = 2 # Cores per executor executor-cores = 8 # Memory per executor executor-memory = "1g" # Heap size for the Speed driver process driver-memory = "512m" ... } }
... View more
02-28-2015
10:49 AM
Have you set it to start a generation based on the amount of input received? that could be triggering the new computation. That said are you sure it only has part of the input? it's possible the zipped file sizes aren't that comparable. Yes, you simply don't have enough memory allocated to your JVM. Your system memory doesn't matter if you haven't let the JVM use much of it. This is in local mode right? you need to use -Xmx to give more heap. Yes it will use different tmp directories for different jobs. That's normal.
... View more
02-27-2015
01:32 AM
At model-build time, yes this is equivalent to a single input with value 3. At runtime, this would have a very slightly different effect as an incremental update since applying and update of 1 and then 2 is slightly different from applying one update of 3. Ingesting is the same as sending the data points to /pref one after the other. So they are not aggregated at ingest time, no.
... View more
02-26-2015
01:28 AM
1 Kudo
Is some of the environment setup only happening in your shell config that is triggered for interactive shells? The problem is fairly clear -- env not setup, and the question is why, but it's not really a Spark issue per se.
... View more
02-25-2015
10:37 AM
Whatever user you are running this as doesn't seem to have the PATH or env variables set up. See the first error: hadoop: command not found
... View more
02-24-2015
12:46 AM
That's strange, master builds in Travis right now: https://travis-ci.org/cloudera/oryx It builds locally for me too. The error is actually from your copy of Maven. Do you have anything unusual set for your classpath or in your Maven install? it's like it can't find the copy of Guava it expects.
... View more
02-23-2015
11:28 AM
You're probably beyond my knowledge. But the immediate error is easy enough to understand; it can't find the Hive classes, so something is still wrong there. I see a typo in your path for example; there are two jars separated by ":=" Is it just that?
... View more
02-23-2015
10:58 AM
So, Spark SQL is shipped unchanged from upstream. It should mostly work as-is, as a result. It is not formally supported, as it's still an alpha component. Here in particular, have a look at other threads on this forum. I think the issue is that Spark SQL is not yet compatible with the later version of Hive in CDH, so it's not built with Hive support. Some of it should still work, but you have to add the Hive JARs to the classpath at least.
... View more