Cluster Mode Overview
Spark applications run as independent sets of processes on a cluster, coordinated by the SparkCOntext object in your main program (aka Driver Program).Fig 1: Cluster Mode Overview |
What is RDD?
Write programs in terms of transformations on distributed datasets.
- Resilient Distributed Datasets
- Collections of objects spread across a cluster, stored in RAM or on Disk
- Built through parallel transformations
- Automatically rebuilt on failure Operations
- Operations:
- Transformations (e.g. map, filter, groupBy)
- Actions (e.g. count, collect, save)
References
- Cluster Design
- [Spark] Cluster Mode Overview
- RDD
- The RDD API by Example
- Zhen He's page at La Trobe University.
- Current with Spark 1.1.0
- A helpful introduction to the RDD API.
- [DataBricks, PDF] Spark Tutorial Summit 2013
- Introductory level talk
very nice article,keep sharing more posts with us.
ReplyDeletethank you...
big data and hadoop course
Your article always possess much of really up to date info. Where do you come up with this? Just stating you are very innovative. Thanks again residential architects in georgia
ReplyDelete