- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Memory for RDD
- Labels:
-
Apache Spark
Created ‎11-14-2016 09:43 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
Couple of question , both are related to memory issue but in different situation
1. sc.textFile(file)
size of the file is 60 GB , whereas I dont have executor memory to load the file. How spark will behave now
2. val inputRDD = sc.textFile(file)
val mapRDD = inputRDD.map(_.split(","));
val DFRDD = mapRDD.map(p => Customer(p(0).trim.toInt, p(1), p(2), p(3), p(4))).toDF
at this line , cluster dont have memory , will it throw Out of Memory error ? or purge previous RDD to move further ?
Created ‎11-14-2016 10:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Gobi Subramani spark keeps lineage for all the previous RDD in order to calculate the next RDD, not sure what the purge mean here but if there is no memory available then spark with throw oom and fail the job. as OOM is error so previous RDD is not recoverable in this case.
Created ‎11-14-2016 10:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Gobi Subramani spark keeps lineage for all the previous RDD in order to calculate the next RDD, not sure what the purge mean here but if there is no memory available then spark with throw oom and fail the job. as OOM is error so previous RDD is not recoverable in this case.
Created ‎11-14-2016 10:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
when any action performed , inputRDD , mapRDD are evaluated and keep it in memory. when evaluating DFRDD , system knows that , there is no memory to perform this transformation , at this point, will spark remove inputRDD from memory to make a room for DFRDD ?
Created ‎11-14-2016 10:59 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
in case of memory pressure Spark will automatically evict RDD partitions from Workers in an LRU manner if no caching or persistence applied. depending on worker memory available LRU eviction happens independently on each Worker node.
