Wednesday, March 25, 2015

Scala Recipes for Spark

MapReduce, Print Output


val f = sc.textFile("README.md")
val words = f.flatMap(l => l.split(" ")).map(word => (word, 1))
words.reduceByKey(_ + _).collect.foreach(println)

Sample Output:
(use,3)
(Online,1)
(site,,1)
(running,1)
(find,1)
(sc.parallelize(range(1000)).count(),1)
(contains,1)
(project,1)
(you,4)
(Pi,1)
(that,3)
(protocols,1)
(a,9)
(or,3)
(high-level,1)
(name,1)
(Hadoop,,2)
(to,14)
(available,1)
((You,1)
(core,1)
(instance:,1)

No comments:

Post a Comment