Member since
07-29-2013
366
Posts
69
Kudos Received
71
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4976 | 03-09-2016 01:21 AM | |
4245 | 03-07-2016 01:52 AM | |
13347 | 02-29-2016 04:40 AM | |
3966 | 02-22-2016 03:08 PM | |
4948 | 01-19-2016 02:13 PM |
12-16-2015
02:46 AM
First, you'd have to define what you're trying to "benchmark". I don't think these distributions vary in speed; they include reasonably different components around the core. That is, it's kind of like choosing a car solely by its max RPM or something, even if that's important to you.
... View more
10-09-2015
01:18 AM
One quick question -- are you running on Windows?
... View more
09-30-2015
11:46 AM
It's possible to just use a static Executor in your code and use it to run multi-threaded operations within each function call. This may not be efficient though. If your goal is simply full utilization of cores, then make sure you have enough executors with enough cores running to use all of your cluster. Then make sure your number of partitions is at least this large. Then each operation can be single-threaded.
... View more
09-30-2015
09:42 AM
You can't use RDDs inside functions on RDD executing remotely, which may be what you're doing. Otherwise i'm not clear what you are executing? I suspect you are doing something that does not work in general in Spark, but may happen to when executing locally in 1 JVM.
... View more
09-28-2015
09:07 AM
You won't be able to read a local file with this code. You are still trying to read from the classpath. I mean this would also have to change to read a file locally.
... View more
09-27-2015
03:09 AM
Here, you're using your own build of Spark against an older version of Hive than what's in CDH. That might mostly work, but you're seeing the problems in compiling and running vs different versions. I'm afraid you're on your own if you're rolling your own build, but, I expect you may get much closer if you make a build targeting the same HIve version in CDH.
... View more
09-25-2015
09:47 AM
The relationship of .jars and classloaders may not be the same as in local mode, such that this may not work as expected. Instead of depending on this, consider either distributing your file via HDFS, or using the --files option with Spark to distribute files to local disk: http://spark.apache.org/docs/latest/running-on-yarn.html
... View more
09-22-2015
06:06 AM
1 Kudo
I remember some problems with snappy and HBase like this, like somehow an older version used by HBase ended up taking precedence in the app classloader and then it could not quite load properly, as it couldn't see the shared library in the parent classloader. This may be a manifestation of that one. I know there are certainly cases where there is no resolution to the conflict, since an app and Spark may use mutually incompatible versions of a dependency, and one will mess with the other if the Spark and app classloader are connected, no matter what their ordering. For this toy example, you'd just not set the classpath setting since it isn't needed. For your app, if neither combination works, then your options are probably to harmonize library versions with Spark, or shade your copy of the library.
... View more
09-22-2015
05:49 AM
Hm, but have you modified classpath.txt? IIRC the last time I saw this it was some strange problem with the snappy from HBase and one used by other things like Spark. Does it work without the userClassPathFirst arg? Just trying to narrow it down. This is always a problem territory, turning on this flag, but that's a simple example with no obvious reason it shouldn't work.
... View more
09-22-2015
04:30 AM
That's a different type of conflict. You have somehow a different version of snappy in your app classpath, maybe? You aren't including Spark/Hadoop in your app jar right? The Spark assembly only contains Hadoop jars if built that way, but in a CDH cluster, that's not a good idea, as the cluster already has its copy of Hadoop stuff. It's built as 'hadoop-provided' and the classpath then contains Hadoop jars and dependencies, plus Spark's. Modifying this means modifying the distribution for all applications. It may or may not work with the rest of CDH and may or may not work with other apps. These modifications aren't supported, though you can try whatever you want if you are OK with 'voiding the warranty' so to speak. Spark classpath issues are tricky in general, not just in CDH, since Spark uses a load of libraries and doesn't shade most of them. Yes, you can try shading your own copies as a fall-back if the classpath-first args don't work. But you might need to double-check what you are trying to bring in.
... View more