Member since
08-11-2014
481
Posts
92
Kudos Received
72
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2984 | 01-26-2018 04:02 AM | |
6290 | 12-22-2017 09:18 AM | |
3019 | 12-05-2017 06:13 AM | |
3280 | 10-16-2017 07:55 AM | |
9308 | 10-04-2017 08:08 PM |
09-05-2018
03:33 AM
Thanks for this clarification. I also had the same qurery ragrding memory issue while loading data. Here you cleared doubt about file loading from HDFS. I have a similar question but the source is a local server or Cloud storage where the data size is more than driver memory ( let's say 1 GB in this case where the driver memory is 250 MB). If I fire command val file_rdd = sc.textFile("/path or local or S3") shoud Spark load the data or as you mentioned above will throgh exception? Also, is there a way to print driver available memroy in Terminal? Many Thanks, Siddharth Saraf
... View more
09-02-2018
01:37 AM
@srowen Is 12 executors really necessary? Surely you just need a total of 12 cores (so you could have 1 executor with 12 cores). Is this what you mean by "Also, 1 core per executor is generally very low."? What happens when you have more cores than kafka partitions? will it generall run faster?
... View more
07-18-2018
02:56 AM
Your point is flawless, I think the issue here (at least at my side) is that the workbench (which I tested in a bootcamp run by Cloudera an year ago) is pretty good, but isn't cheap also. For labs, developments and all that stuff it is not affordable for a small Company. In my case, my Company (consultancy) need to be able to develop a new product or service that makes use of ML techniques and would be best developed in a "shared notebook" fashion. The result would be probably sell to the customer together with the workbench, but of course we need to develop it first, with no guarantee of success. Although we are Cloudera resellers, there's no guarantee the Customer also wants to buy the CDSW license (maybe a "developer license" would cover this gap). That's why we need to switch to inexpensive software like Zeppelin and Livy to get the job done, at least in alpha stage. This is my point of view. Take care, O.
... View more
04-27-2018
08:05 PM
Hi! I've got the same error message and I solved using the latest elasticsearch-spark version to my corresponding scala version: spark-submit --packages org.elasticsearch:elasticsearch-spark-20_2.11:6.2.4 your_script.py Hope it helps.
... View more
04-24-2018
11:53 AM
Can you expand on this? Am pretty new to spark and this is marked as the solution. Also, since dynamicAllocation can handle this why would an user not want to enable that instead?
... View more
01-26-2018
04:02 AM
1 Kudo
I know these are well-known as feature requests, and ones I share. I don't know that they are planned for any particular release, but am sure these are tracked already as possible features.
... View more
01-01-2018
10:30 PM
The files will not be in a specific order. Is this a solution: Load all the files into Spark & create a dataframe out of it and then split this main dataframe into smaller ones by using the delimiter("...") which is present at the end of each file. Once this is done, map the dataframes by checking if the third line of each file contains the words: "SEVERE: Error" and group/merge them together. Similarly following the approach for the other cases and finally have three separate dataframes exclusice for each case. Is this approach viable or is there any better way I can follow.
... View more
12-27-2017
06:10 AM
sorry , there was a typo, the code I am trying to run is :- df.write.bucketBy(2,"col_name").saveAsTable("table")
... View more
12-22-2017
09:18 AM
1 Kudo
This looks like a mismatch between the version of pandas Spark uses and that you have on the driver, and whatever is installed with the workers on the executors.
... View more