Scaling *up* Hadoop for under-100 GB jobs “Nobody ever got fired for buying a cluster” – A very interesting Microsoft research paper implying: Most common huge jobs are still under 100GB (yes, including the elephant friendly Facebook) Addressing the “problematic” issue of “when and how to distribute with Hadoop”, if any? Addressing the cost efficiency […]
Category archives for Hadoop
When Lucene met Hadoop. The Blur project So expected. Still incubating, but looking good 🙂 Though, only a complete Solr-over-Hadoop solution would close the deal, in my opinion
Getting real-time queries capabilities with Hadoop, using Cloudera Impala Main advantages: Open Source Faster than Hive Backed by Cloudera
Wraps NLP nice and simple with Hadoop and Python. Still correct! http://blog.cloudera.com/blog/2010/03/natural-language-processing-with-hadoop-and-python/
Pivotal HD is, in short, about wrapping Hadoop distribution nicely with full SQL support and easy virtualization and storage integration, from Pivotal’s parents EMC & VMware. Pivotal HD – http://www.greenplum.com/products/pivotal-hd
HadoopOnAzure is now officially HDInsight! It is now officially integrated as one of the WindowsAzure platform services
Great news for Big Data fans who want to use their existing Azure platforms: Microsoft Embraces Elephant of Open Source (Wired Enterprise | Wired.com) http://lnkd.in/Xb3bpH Sign up for CTP:http://lnkd.in/8vMf3B You need to enroll first: http://lnkd.in/3wTxuy