Member since
09-17-2013
63
Posts
5
Kudos Received
0
Solutions
08-05-2020
10:50 PM
@Abhi07 , as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question.
... View more
07-07-2016
07:31 PM
LazyOutputFormat is available for both APIs. Here's the one for the older API: http://archive.cloudera.com/cdh5/cdh/5/hadoop/api/org/apache/hadoop/mapred/lib/LazyOutputFormat.html
... View more
12-18-2015
10:43 PM
Hi @Srinivasarao Daruna HDP does not support Spark in Standalone mode. You need to use Spark on Yarn. Running Spark in Yarn Cluster mode you can specify number of executors by using the parameter: --num-executor=6 This will give you 6 executors For additional information regarding using Yarn Cluster mode please see - http://spark.apache.org/docs/latest/running-on-yar... Cheers, Andrew
... View more
12-08-2015
09:50 PM
> As commands in shell scripts are only able to recognize hdfs directories This is an incorrect assumption. The shell action will merely execute any given script file (as normally executed from a process), and does not care about what is within it. Does your script fail with an error? If so, please post the error.
... View more
09-01-2015
12:04 AM
I would summarize saying that one may use SparkSql (or Hive) in order to write SQL queries with complex joining. Else, with Spark, one is able and must describe the execution plan, so he has to write each join separately.
... View more
08-26-2015
05:04 AM
This change will not happen. You can not change the scheduler without restarting the resource manager. It is not a job configurable setting but a server side setting only read on startup. Wilfred
... View more
08-18-2015
07:28 PM
How many executors do you have when you run this? I see the same when I run it because it gets sent to each executor (2 in my case) Wilfred
... View more
08-14-2015
07:42 AM
Hi, To create a pair-RDD from a RDD, I used the "keyBy" transformation to extract the key from each value: val fileC = sc.textFile("hdfs://.../user/.../myfile.txt")
.keyBy(line => line.substring(5,13).trim())
.mapValues(line => ( line.substring(87,92).trim()
, line.substring(99,112).trim()
, line.substring(120,126).trim()
, line.substring(127,131).trim()
)
) The "keyBy" provides me a new pair-RDD for which the key is a substring of my text value. Then the "mapValues" transformations opers like a "map" one on each value of my pair-RDD, not on keys...
... View more
08-06-2015
02:36 AM
I don't think it has to do with functional programming per se, but yes, it's because the function/code being executed has to be sent from the driver to the executors, and so the function object itself must be serializable. It has no relation to security.
... View more
08-05-2015
11:05 AM
If you call persist() on an RDD, it means that the data in the RDD will be persisted but only later when something causes it to be computed for the first time. It is not immediately evaluated.
... View more