Member since
08-11-2014
481
Posts
92
Kudos Received
72
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3440 | 01-26-2018 04:02 AM | |
7077 | 12-22-2017 09:18 AM | |
3536 | 12-05-2017 06:13 AM | |
3847 | 10-16-2017 07:55 AM | |
11195 | 10-04-2017 08:08 PM |
08-31-2016
12:20 PM
Thanks, it did fix the issue. Got rid of the variable from the code, I am able to execute it in cluster mode.
... View more
08-26-2016
09:18 AM
It has always been documented in "Known Issues": https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_spark_ki.html Generally speaking, there aren't differences. Not supported != different. However there are some pieces that aren't shipped like the thrift server and SparkR. Usually differences crop up when upstream introduces a breaking change and it can't be followed in a minor release. For example: default in CDH is for the "legacy" memory config parameters to be active so that default memory config doesn't change in 1.6. Sometimes it relates to other stuff in the platfrom that can't change, like I think the Akka version is (was) different because other stuff in Hadoop needed a different version. The biggest example of this IMHO is Spark Streaming + Kafka. Spark 1.x doesn't support Kafka 0.9+ but CDH 5.7+ had to move to it to get security features. So CDH Spark 1.6 will actually only work with 0.9+ because the Kafka differences are mutually incompatible. Good in that you can use recent Kafka, but, a difference! Most of it though are warnings about incompatibilities between what Spark happens to support and what CDH ships in other components.
... View more
08-18-2016
01:59 AM
1 Kudo
max-age-data-hours will cause it to delete data on HDFS that is older than this number of hours. This means that subsequent models will be built on historical data that does not include data older than this time. That's all there is to it.
... View more
08-16-2016
08:09 AM
5.5 or 5.7? the title and text disagree. 5.5 would have Spark 1.4, and I am not sure whether SQLContext was exposed as sqlContext by the shell by default like that. It should be in Spark 1.6 (= CDH 5.6+)
... View more
08-07-2016
10:02 PM
The first operation makes each value into a set containing that single value. ++ just adds collections together, combining elements of both sets. This is trying to build up a set of all values for each key. It can be written more simply as "groupByKey" really. Even this code could be more compact and efficient.
... View more
07-23-2016
01:41 PM
i typed spark-shell and i got scala console
... View more
07-06-2016
08:13 AM
I would advise to use ipython's internal debugger ipdb. This debugger allows you to run every statement step by step. * http://quant-econ.net/py/ipython.html#debugging * https://docs.python.org/3/library/pdb.html Finally regarding the other statements above when you using Anaconda's ipython remember to set the environment variable PYSPARK_PYTHON to the location of ipython (ex. /usr/bin/ipython) so PySpark knows where to find ipython. Good luck.
... View more
06-27-2016
09:28 AM
1 Kudo
I have a guess: you need to make each of those things a separate arg tag? I don't know Oozie well myself, but something similar is needed in Maven config files. That is it may be reading this as one arg not two, called "-xm mapreduce"
... View more
06-13-2016
03:46 PM
Yes it would be. The execution of transformations/actions is the same, just the source is different.
... View more