About Srini_D

VidyaSargur · ‎08-05-2020

@Abhi07 , as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question.

Harsh J · ‎07-07-2016

LazyOutputFormat is available for both APIs. Here's the one for the older API: http://archive.cloudera.com/cdh5/cdh/5/hadoop/api/org/apache/hadoop/mapred/lib/LazyOutputFormat.html

awatson · ‎12-18-2015

Hi @Srinivasarao Daruna HDP does not support Spark in Standalone mode. You need to use Spark on Yarn. Running Spark in Yarn Cluster mode you can specify number of executors by using the parameter: --num-executor=6 This will give you 6 executors For additional information regarding using Yarn Cluster mode please see - http://spark.apache.org/docs/latest/running-on-yar... Cheers, Andrew

Harsh J · ‎12-08-2015

> As commands in shell scripts are only able to recognize hdfs directories This is an incorrect assumption. The shell action will merely execute any given script file (as normally executed from a process), and does not care about what is within it. Does your script fail with an error? If so, please post the error.

Grg · ‎09-01-2015

I would summarize saying that one may use SparkSql (or Hive) in order to write SQL queries with complex joining. Else, with Spark, one is able and must describe the execution plan, so he has to write each join separately.

Wilfred · ‎08-26-2015

This change will not happen. You can not change the scheduler without restarting the resource manager. It is not a job configurable setting but a server side setting only read on startup. Wilfred

Wilfred · ‎08-18-2015

How many executors do you have when you run this? I see the same when I run it because it gets sent to each executor (2 in my case) Wilfred

Grg · ‎08-14-2015

Hi, To create a pair-RDD from a RDD, I used the "keyBy" transformation to extract the key from each value: val fileC = sc.textFile("hdfs://.../user/.../myfile.txt") .keyBy(line => line.substring(5,13).trim()) .mapValues(line => ( line.substring(87,92).trim() , line.substring(99,112).trim() , line.substring(120,126).trim() , line.substring(127,131).trim() ) ) The "keyBy" provides me a new pair-RDD for which the key is a substring of my text value. Then the "mapValues" transformations opers like a "map" one on each value of my pair-RDD, not on keys...

srowen · ‎08-06-2015

I don't think it has to do with functional programming per se, but yes, it's because the function/code being executed has to be sent from the driver to the executors, and so the function object itself must be serializable. It has no relation to security.

srowen · ‎08-05-2015

If you call persist() on an RDD, it means that the data in the RDD will be persisted but only later when something causes it to be computed for the first time. It is not immediately evaluated.

Online	Offline
Last Visited	‎09-18-2019 01:00 PM

Member Since	‎09-17-2013 08:36 PM
Last Visited	‎09-18-2019 01:00 PM
Posts	63
Kudos received	5

Cloudera Community

Re: Unable to start Cloudera QuickStart VM with VM...

Re: how to suppress mapper output files if the out...

Re: how many spark execturos runs for the below co...

Re: how to execute oozie shell action with script ...

Re: Joining 3 pair-RDDs

Re: When i set the scheduler from program, it impa...

Re: How many times does the script used in spark p...

Re: How to create spark PairRDD in scala .?

Re: What is the reason behind Spark Functions exte...

Re: What does it mean, Spark persist call on its o...