Member since
08-11-2014
481
Posts
92
Kudos Received
72
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2289 | 01-26-2018 04:02 AM | |
4726 | 12-22-2017 09:18 AM | |
2263 | 12-05-2017 06:13 AM | |
2539 | 10-16-2017 07:55 AM | |
6744 | 10-04-2017 08:08 PM |
03-25-2015
02:55 PM
This generally means you're mixing two versions of Spark somehow. Are you sure your app isn't also trying to bundle Spark? are you using the CDH Spark, and not your own compiled version?
... View more
03-25-2015
09:48 AM
1 Kudo
To add a little color, yes you can do that, although the CLASSPATH intentionally does not include Hive, since as I understand, Spark doesn't work with the later versions of Hive that CDH 5.3 and beyond use. It still may work enough to do what you need, so, have at it. But you may hit some incompatibilities.
... View more
03-18-2015
03:51 AM
1 Kudo
Looks good although I would recommend closing the statement and connection too. Also, you're executing an update for every datum. JDBC as an addBatch / executeBatch interface too I think? might be faster.
... View more
03-16-2015
04:51 AM
Yes, perfectly possible. It's not specific to Spark Streaming or even Spark; you'd just use foreachPartition to create and execute a SQL statement via JDBC over a batch of records. The code is just normal JDBC code.
... View more
03-15-2015
08:43 AM
2 Kudos
Although that thread sounds similar, I don't think it's the same thing. Failing to bind is not a failure to connect to a remote host. It means the local host didn't allow the process to listen on a port. The two most likely explanations are: - an old process is still listening on that port, or at least, another still-running process is - you appear to be binding to a non-routable address (192.168.x.x) This might be OK but worth double-checking
... View more
03-15-2015
03:29 AM
As you can see, the problem is that the receiver can't bind to its assigned address. Is there any networking-related restriction in place that would prevent this? is this the port you intended?
... View more
03-12-2015
05:47 AM
I don't think there's anything special to know, beyond what's documented in the RHadoop subprojects. So it's not something that we ship, support or document separately. I have set up the rhadoop libraries with CDH and it's straightforward. It's really a set of client side libraries that you install into *R*, not *Hadoop*. However to run rmr2 you will need R installed locally on all of your Hadoop cluster nodes, since it will run MapReduce jobs that execute R scripts. I recall that you have to install a bunch of other R packages before installing the rhdfs/rhbase/plyrmr libraries, and I found this in my notes as the set of prerequisites: install.packages(c("Rcpp", "RJSONIO", "bitops", "digest", "functional", "reshape2", "stringr", "plyr", "caTools", " rJava ", "dplyr", "R.methodsS3", "Hmisc"))
... View more
03-11-2015
11:46 AM
In general, NoSuchMethodError in Java means you compiled against one version of something, and executed against a different version. Check your build.
... View more
03-11-2015
07:21 AM
You can also just add the HIve jars to your app classpath. The catch here is that Spark doesn't quite support the later version of Hive in CDH. This might work for what you're trying to do, but if you build your own, you're building for a slightly different version of Hive than you run here.
... View more
03-09-2015
03:02 AM
This is answered a few times already here. Have a look at for example http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/I-am-using-a-hive-cotext-in-pyspark-cdh5-3-virtual-box-and-i-get/m-p/24418#U24418 The short answer is that Spark is not entirely compatible with recent versions of Hive found in CDH, but may still work for a lot of use cases. The Spark bits are still there. You have to add Hive to the classpath yourself.
... View more
03-06-2015
12:08 PM
No, it's not something you really install. It's a library. However I think you'll find the binaries already exist on all the nodes in your cluster anyway.
... View more
03-06-2015
11:43 AM
In bash for example: unset VARIABLE
... View more
03-06-2015
11:34 AM
It's just an env variable; you can always "export VARIABLE=VALUE" in the shell. But this message is not an error. In fact you want to see this if you intend to run on a cluster.
... View more
03-06-2015
08:32 AM
Mahout is already shipped with CDH. It's not something you really install; it's a library. Can you say any more about the problem you are facing?
... View more
03-05-2015
05:04 AM
1 Kudo
Yes but it's a member of a class. When the class is instantiated on the remote worker, it is null again. Make the Broadcast a member of the new function you are defining.
... View more
03-05-2015
03:27 AM
What is null? I don't see you using a broadcast fariable in a closure here. You just put one in a static member, which isn't going to work.
... View more
03-04-2015
03:43 PM
There is no absolute minimum. I would imagine that, for reasoanbly sized problems, you'd want to allocate 1GB of memory to each of these processes -- meaning at least 1 executor for both batch and streaming with 1GB of memory, and at least 1 and probably more cores. The serving layer should probably have 1GB+ of memory too and will use as many cores as the machine has. This is designed for a cluster of machines, I suppose, but nothing prevents you from running everything on one machine.
... View more
03-02-2015
04:48 AM
1 Kudo
This means that your YARN cluster does not have the amount of resources (CPU, memory) available that the app is asking for. How much is available? The default is pretty modest though; 2 executors, and 1g RAM and 8 cores per executor. Maybe that's too many cores to ask for? I could turn down the default. You can change it in your config file. oryx = { ... speed = { streaming = { ... # Number of executors to start num-executors = 2 # Cores per executor executor-cores = 8 # Memory per executor executor-memory = "1g" # Heap size for the Speed driver process driver-memory = "512m" ... } }
... View more
02-28-2015
10:49 AM
Have you set it to start a generation based on the amount of input received? that could be triggering the new computation. That said are you sure it only has part of the input? it's possible the zipped file sizes aren't that comparable. Yes, you simply don't have enough memory allocated to your JVM. Your system memory doesn't matter if you haven't let the JVM use much of it. This is in local mode right? you need to use -Xmx to give more heap. Yes it will use different tmp directories for different jobs. That's normal.
... View more
02-27-2015
01:32 AM
At model-build time, yes this is equivalent to a single input with value 3. At runtime, this would have a very slightly different effect as an incremental update since applying and update of 1 and then 2 is slightly different from applying one update of 3. Ingesting is the same as sending the data points to /pref one after the other. So they are not aggregated at ingest time, no.
... View more
02-26-2015
01:28 AM
1 Kudo
Is some of the environment setup only happening in your shell config that is triggered for interactive shells? The problem is fairly clear -- env not setup, and the question is why, but it's not really a Spark issue per se.
... View more
02-25-2015
10:37 AM
Whatever user you are running this as doesn't seem to have the PATH or env variables set up. See the first error: hadoop: command not found
... View more
02-24-2015
12:46 AM
That's strange, master builds in Travis right now: https://travis-ci.org/cloudera/oryx It builds locally for me too. The error is actually from your copy of Maven. Do you have anything unusual set for your classpath or in your Maven install? it's like it can't find the copy of Guava it expects.
... View more
02-23-2015
11:28 AM
You're probably beyond my knowledge. But the immediate error is easy enough to understand; it can't find the Hive classes, so something is still wrong there. I see a typo in your path for example; there are two jars separated by " :=" Is it just that?
... View more
02-23-2015
10:58 AM
So, Spark SQL is shipped unchanged from upstream. It should mostly work as-is, as a result. It is not formally supported, as it's still an alpha component. Here in particular, have a look at other threads on this forum. I think the issue is that Spark SQL is not yet compatible with the later version of Hive in CDH, so it's not built with Hive support. Some of it should still work, but you have to add the Hive JARs to the classpath at least.
... View more
02-22-2015
08:52 AM
So, your app only has 3 cores from YARN? then your app can only be executing 3 tasks in parallel. I'm not sure how many receivers you are starting, but is that less? It sounds like you expected much more resource to be avialable, so I'd go look at your YARN config and what's using the resource and compare to what Spark is actually requesting.
... View more
02-22-2015
08:29 AM
Go the the Spark UI and look at the top of the screen -- click Executors
... View more
02-22-2015
08:11 AM
You usually use --executory-memory to set executor memory but I don't think it matters. You also generally do not use env variables to configure spark-shell. Although it might be giving the desird results, i'd use standard command line flags. It sounds like simpler jobs are working. While you request 8 executors do you actually get them from YARN? go look at your executors tab.
... View more
02-22-2015
05:04 AM
OK, what I'm interested in is how many executor slots you have. How many machines, how many executors, how many cores per executor? we want to confirm it's at least as many as the number of receivers. what about a simpler test involving a file-based DStream? if that works then it rules out much except the custom DStream.
... View more
- « Previous
- Next »