DevOps: Spark Architecture and Design

Monday, March 23, 2015

Spark Architecture and Design

Cluster Mode Overview

Spark applications run as independent sets of processes on a cluster, coordinated by the SparkCOntext object in your main program (aka Driver Program).

Fig 1: Cluster Mode Overview

What is RDD?

Write programs in terms of transformations on distributed datasets.

Collections of objects spread across a cluster, stored in RAM or on Disk
Built through parallel transformations
Automatically rebuilt on failure Operations

Transformations (e.g. map, filter, groupBy)
Actions (e.g. count, collect, save)

References

Cluster Design

[Spark] Cluster Mode Overview

The RDD API by Example

Zhen He's page at La Trobe University.
Current with Spark 1.1.0
A helpful introduction to the RDD API.

[DataBricks, PDF] Spark Tutorial Summit 2013

Introductory level talk

2 comments:

veera cynixitSeptember 14, 2020 at 10:18 PM
very nice article,keep sharing more posts with us.

thank you...

big data and hadoop course
ReplyDelete
Replies
JACKJanuary 23, 2022 at 8:56 PM
Your article always possess much of really up to date info. Where do you come up with this? Just stating you are very innovative. Thanks again residential architects in georgia
ReplyDelete
Replies

Add comment