About srowen

SiddhSpark · ‎09-05-2018

Thanks for this clarification. I also had the same qurery ragrding memory issue while loading data. Here you cleared doubt about file loading from HDFS. I have a similar question but the source is a local server or Cloud storage where the data size is more than driver memory ( let's say 1 GB in this case where the driver memory is 250 MB). If I fire command val file_rdd = sc.textFile("/path or local or S3") shoud Spark load the data or as you mentioned above will throgh exception? Also, is there a way to print driver available memroy in Terminal? Many Thanks, Siddharth Saraf

samthebest · ‎09-02-2018

@srowen Is 12 executors really necessary? Surely you just need a total of 12 cores (so you could have 1 executor with 12 cores). Is this what you mean by "Also, 1 core per executor is generally very low."? What happens when you have more cores than kafka partitions? will it generall run faster?

AnkitShri · ‎08-18-2018

where did you add the CLASS?

omaritec · ‎07-18-2018

Your point is flawless, I think the issue here (at least at my side) is that the workbench (which I tested in a bootcamp run by Cloudera an year ago) is pretty good, but isn't cheap also. For labs, developments and all that stuff it is not affordable for a small Company. In my case, my Company (consultancy) need to be able to develop a new product or service that makes use of ML techniques and would be best developed in a "shared notebook" fashion. The result would be probably sell to the customer together with the workbench, but of course we need to develop it first, with no guarantee of success. Although we are Cloudera resellers, there's no guarantee the Customer also wants to buy the CDSW license (maybe a "developer license" would cover this gap). That's why we need to switch to inexpensive software like Zeppelin and Livy to get the job done, at least in alpha stage. This is my point of view. Take care, O.

iamezcua-dev · ‎04-27-2018

Hi! I've got the same error message and I solved using the latest elasticsearch-spark version to my corresponding scala version: spark-submit --packages org.elasticsearch:elasticsearch-spark-20_2.11:6.2.4 your_script.py Hope it helps.

ashwarg · ‎04-24-2018

Can you expand on this? Am pretty new to spark and this is marked as the solution. Also, since dynamicAllocation can handle this why would an user not want to enable that instead?

srowen · ‎01-26-2018

I know these are well-known as feature requests, and ones I share. I don't know that they are planned for any particular release, but am sure these are tracked already as possible features.

Sidhartha · ‎01-01-2018

The files will not be in a specific order. Is this a solution: Load all the files into Spark & create a dataframe out of it and then split this main dataframe into smaller ones by using the delimiter("...") which is present at the end of each file. Once this is done, map the dataframes by checking if the third line of each file contains the words: "SEVERE: Error" and group/merge them together. Similarly following the approach for the other cases and finally have three separate dataframes exclusice for each case. Is this approach viable or is there any better way I can follow.

Sanjeev_Krishna · ‎12-27-2017

sorry , there was a typo, the code I am trying to run is :- df.write.bucketBy(2,"col_name").saveAsTable("table")

srowen · ‎12-22-2017

This looks like a mismatch between the version of pandas Spark uses and that you have on the driver, and whatever is installed with the workers on the executors.

Online	Offline
Last Visited	‎02-13-2018 12:34 PM

Member Since	‎08-11-2014 09:17 AM
Last Visited	‎02-13-2018 12:34 PM
Posts	481
Kudos received	87

Cloudera Community

Re: Own code editor in CDSW?

Re: error using Pandas within PySpark transformati...

Re: Does CDSW need to be part of the cluster?

Re: Local Data combined with HDFS

Re: Where can I find Oryx 1.x releases (or GitHub)

Re: Memory Issues in while accessing files in Spar...

Re: What's the right number of cores and executors...

Re: spark-submit exception

Re: Cloudera and Notebooks (Zeppelin/HUE/Jupyter)

Re: Spark (Standalone) error local class incompati...

Re: Idle Spark Shells

Re: Own code editor in CDSW?

Re: How to split the dataframe of multiple files i...

Re: AttributeError: Dataframe has no attribute bu...

Re: error using Pandas within PySpark transformati...