Justin Kestelyn from Databricks describes the differences between Hadoop and Spark processing models in a post on “Cloudera’s blog“:
At its core, Spark provides a general programming model that enables
developers to write application by composing arbitrary operators, such as
mappers, reducers, joins, group-bys, and filters. […] In addition, Spark keeps track of the data
that each of the operators produces, and enables applications to reliably store this data in
✚ This looks in a way similar to the Cascading programming model combined with the capability of storing in memory the working dataset for the current computations.
Original title and link: Using Spark for fast in-memory computing