I worked for years in HPC. Specifically I worked in workload management on tools like PBS Professional and Platform LSF. Most of the time when we wanted to understand what was happening with an HPC scheduling system with more options than was humanly possible to comprehend we tested on live systems including tuning options and making changes until the desired behavior was achieved. It was a little more art than science at times. No matter the platform I used one thing I always wanted was a scheduling simulator to allow me to scientifically optimize the scheduler behavior.
If you are new to Hadoop you may not have heard of Yarn. This is a part of Hadoop 2.0 and will represent the next generation of Hadoop processing including new daemons, scheduling based upon resources (no more slots), and the possibility execute more programming paradigms than just MapReduce (i.e., MPI, Storm etc). I think most people have recognized that while MapReduce is great for certain things there is probably also room at the party for a few other guests.
There is one JIRA specifically that I have been watching that has me all worked up. Yes, its a scheduling simulator. The proposed feature would offer such novel delights as detailed costs per scheduler operation allowing tuning of the entire cluster or queues themselves. From my work history I know how hotly debated scheduling policy becomes at companies. It typically becomes a struggle between competing groups jockeying for resources on their projects. This one JIRA would move that conversation from an argument to a discussion of simulation results run to provide information to make a decision versus a political battle.
This will probably be one of the first things I play with in Hadoop 2.0 very soon. Below is a nice video showing the current state of the project. This has me excited to say the least. I have many other ideas to help bring the knowledge of HPC schedule to Hadoop where applicable but that will have to be another blog.