“Nobody ever got ﬁred for buying a cluster” –
A very interesting Microsoft research paper implying:
- Most common huge jobs are still under 100GB (yes, including the elephant friendly Facebook)
- Addressing the “problematic” issue of “when and how to distribute with Hadoop”, if any?
- Addressing the cost efficiency (yes, watt and heat counts…) of scaling up, even with Hadoop, for jobs under 100GB