About srowen

srowen · ‎03-04-2015

There is no absolute minimum. I would imagine that, for reasoanbly sized problems, you'd want to allocate 1GB of memory to each of these processes -- meaning at least 1 executor for both batch and streaming with 1GB of memory, and at least 1 and probably more cores. The serving layer should probably have 1GB+ of memory too and will use as many cores as the machine has. This is designed for a cluster of machines, I suppose, but nothing prevents you from running everything on one machine.

srowen · ‎03-02-2015

This means that your YARN cluster does not have the amount of resources (CPU, memory) available that the app is asking for. How much is available? The default is pretty modest though; 2 executors, and 1g RAM and 8 cores per executor. Maybe that's too many cores to ask for? I could turn down the default. You can change it in your config file. oryx = { ... speed = { streaming = { ... # Number of executors to start num-executors = 2 # Cores per executor executor-cores = 8 # Memory per executor executor-memory = "1g" # Heap size for the Speed driver process driver-memory = "512m" ... } }

srowen · ‎02-28-2015

Have you set it to start a generation based on the amount of input received? that could be triggering the new computation. That said are you sure it only has part of the input? it's possible the zipped file sizes aren't that comparable. Yes, you simply don't have enough memory allocated to your JVM. Your system memory doesn't matter if you haven't let the JVM use much of it. This is in local mode right? you need to use -Xmx to give more heap. Yes it will use different tmp directories for different jobs. That's normal.

srowen · ‎02-27-2015

At model-build time, yes this is equivalent to a single input with value 3. At runtime, this would have a very slightly different effect as an incremental update since applying and update of 1 and then 2 is slightly different from applying one update of 3. Ingesting is the same as sending the data points to /pref one after the other. So they are not aggregated at ingest time, no.

srowen · ‎02-26-2015

Is some of the environment setup only happening in your shell config that is triggered for interactive shells? The problem is fairly clear -- env not setup, and the question is why, but it's not really a Spark issue per se.

srowen · ‎02-26-2015

Right. What user is used in each case?

srowen · ‎02-25-2015

Whatever user you are running this as doesn't seem to have the PATH or env variables set up. See the first error: hadoop: command not found

srowen · ‎02-24-2015

That's strange, master builds in Travis right now: https://travis-ci.org/cloudera/oryx It builds locally for me too. The error is actually from your copy of Maven. Do you have anything unusual set for your classpath or in your Maven install? it's like it can't find the copy of Guava it expects.

srowen · ‎02-23-2015

You're probably beyond my knowledge. But the immediate error is easy enough to understand; it can't find the Hive classes, so something is still wrong there. I see a typo in your path for example; there are two jars separated by ":=" Is it just that?

srowen · ‎02-23-2015

So, Spark SQL is shipped unchanged from upstream. It should mostly work as-is, as a result. It is not formally supported, as it's still an alpha component. Here in particular, have a look at other threads on this forum. I think the issue is that Spark SQL is not yet compatible with the later version of Hive in CDH, so it's not built with Hive support. Some of it should still work, but you have to add the Hive JARs to the classpath at least.

Online	Offline
Last Visited	‎02-13-2018 12:34 PM

Member Since	‎08-11-2014 09:17 AM
Last Visited	‎02-13-2018 12:34 PM
Posts	481
Kudos received	87

Cloudera Community

Re: Own code editor in CDSW?

Re: error using Pandas within PySpark transformati...

Re: Does CDSW need to be part of the cluster?

Re: Local Data combined with HDFS

Re: Where can I find Oryx 1.x releases (or GitHub)

Re: Oryx 2 with cloudera 5.3

Re: Oryx 2 with cloudera 5.3

Re: Questions on several API end points and model

Re: Questions on several API end points and model

Re: Scheduling Spark with Crontab

Re: Scheduling Spark with Crontab

Re: Scheduling Spark with Crontab

Re: Cannot build Oryx 1.0.2

Re: spark sql is not working on CDH5.3

Re: spark sql is not working on CDH5.3