Don’t over think Hadoop.

I was watching TV yesterday and flipped past the film Moneyball on one of the movie channels. I had seen this movie before and I have heard all about the relationship to BI and Big Data. I guess for me I saw the central theme of that movie as disruption as it relates to innovation.

The disruptive power of not really needing high end hardware to build a supercomputer has the market scrambling to force Hadoop into a model that fits many years of coaching by industry giants. You have to have super high end hardware with many layers of backup, redundancy and failsafes. You have to come in on the weekend and neglect your family to “save the day” for some silly website powering someones else’s critical functionality. Stuck in a rut of selling high end nodes the thought of converting to disposability of slave nodes combined with the resiliency of the power that large numbers of slaves brings is disruptive to the entire industry selling into modern data warehouse powered businesses. Not needing Fiber or even 10GbE means networks are smaller, less expensive and closer to disposable. No need for virtualization you say? How can this be? Virtualization is good for everything isn’t it? Its faster, more dense and cost efficient right? Just ask your vendor. They will tell you all about it. Don’t even get me started on the eternal battle of share everything versus share nothing. I have argued for share nothing for many years in classic HPC to no avail. How else can sell massive network or cluster based storage? Query the market and you will find no end to the perversion of the original intent of Hadoop. Changes to the file system, replication level, placement of data into traditional databases and placement of MapReduce over the same. Hogwash. Buy nodes. Lots of them. Cheap ones with zero features for redundancy (no you don’t need two power supplies or 18 NIC cards and shouldn’t be paying more than $5k MAX). If they break beyond repair put them in the dumpster (or may be donate them to a good cause). Don’t over think Hadoop. Start using it and get educated. It will be disruptive and cause people to fear change (including in your own company) but at this stage much like “cloud” was a few years ago if you don’t have a strategy for Hadoop in place you are going to be sitting on your couch in October watching competitive company brand X win the world series of your field.