- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
In spark, why foreach is designed as an action?
- Labels:
-
Apache Spark
Created ‎04-28-2016 05:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is a newbie question. I understand that spark actions include count(), first(), take(), etc, each triggers execution with some sort of "output", either to console, or to storage. What I don't understand is why foreach is designed as an action?
Created ‎04-28-2016 06:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The foreach action in Spark is designed like a forced map (so the "map" action occurs on the executors). Foreach is useful for a couple of operations in Spark. They are required to be used when you want to guarantee an accumulator's value to be correct. In addition, they can be used when you want to move data to an external system, like a database, though typically a foreachPartition is used for that operation.
Created ‎04-29-2016 02:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To add to Joe's response, one more thing to note here is that foreach() action doesn't return anything. So if you want to invoke any transformations on your RDDs on executors without returning anything back to driver, you can invoke them using foreach() action which is not normally possible with other actions apart from saveAsxxx() actions.
-Prav
Created ‎04-07-2017 05:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can use RDD.toLocalIterator() to bring the data to the driver (one RDD partition at a time for your low banner ad design cost😞
val array = sc.parallelize(List(1, 2, 3, 4)) for(rec <- array.toLocalIterator) { println(rec) }
