Professional Documents
Culture Documents
processing stage where we go from our lines to our key value pairs. And at this
point, we can trigger cache. And now we can go ahead with our
execution and finalize with wordcounts. And then we call our first action,
collect on wordcounts. That is going to copy the results
of our computation back to driver. And then let's say that we have a second
job where we can just want to inspect one element of pairs_RDD. So these two are
two different jobs. So without cache,
the second job would have triggered the execution again of the first
stage of the pipeline. Instead, with caching,
now pairs_RDD is already in memory. So the second job is gonna
be almost instantaneous, because Spark has only to read back from
memory and return the result of the job.