Member since
09-17-2014
88
Posts
3
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2676 | 07-15-2015 08:57 PM | |
9429 | 07-15-2015 06:32 PM |
09-09-2015
10:36 PM
Start here, and drill further down into the DFSClient and DFSInputStream, etc. classes: https://github.com/cloudera/hadoop-common/blob/cdh5.4.5-release/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java#L294-L303
... View more
08-10-2015
05:07 PM
Hi,
As described in the sort based shuffle design doc (https://issues.apache.org/jira/secure/attachment/12655884/Sort-basedshuffledesign.pdf), each map task should generate 1 shuffle data file 1 index file.
Regarding your second question, the property to specify the buffer for shuffle data is "spark.shuffle.memoryFraction". This is discussed in more detail in the following Cloudera blog:
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
Regards,
Bjorn
... View more
07-27-2015
04:16 AM
The first case is: read - shuffle - persist - count The second case is: read (from persisted copy) - count You are right that coalesce does not always shuffle, but it may in this case. It depends on whether you started with more or fewer partitions. You should look at the Spark UI to see whether a shuffle occurred.
... View more
07-17-2015
12:11 AM
Thanks for the update. I can reproduce the issue, but only when the target partition is empty. As soon as I add some data, compute incremental stats works as expected. So I'm still thinking you are hitting an edge case with an empty partition?
... View more
07-16-2015
05:57 AM
I am happy to see that you found your answer. Thanks for sharing it. 🙂
... View more
07-15-2015
06:32 PM
Actually problem was in very agressive caching and overfilling spark.yarn.executor.memoryOverhead buffer and as cosequence OOM error. i just increase it and everything works now
... View more
07-02-2015
05:52 PM
None of Impala's supported file formats are able to store data in sorted order on disk. Therefore the ORDER BY clause in the INSERT does not have any effect. The data is written out in a potentially unsorted order regardless. Best, Henry
... View more
06-23-2015
09:40 AM
The QuickStart VM includes example data. If you're looking for a VM that is exclusive to Spark, I don't think you'll find that.
... View more
- « Previous
-
- 1
- 2
- Next »