MapR Yarn Log Aggregation

MapR sometimes takes some criticism for being “different” which is sometimes translated as “hard to use” because it doesn’t behave like Hadoop distribution brand X that I know and love. I get that. I also know that all software (like people) has warts. Sometimes those differences are intentional. For the most part this is the case with MapR. Lots of folks misunderstand the method to the madness initially. I can say that most users once armed with some understanding tend to love MapR.

That aside I wanted to point out one such difference. It has to to with Yarn Log Aggregation. I hear repeatedly folks saying they enabled log aggregation but for some reason are still seeing a message from the History Server telling them there is a problem.

What typically is happening in this case is that although folks remembered to turn on log aggregation by setting yarn.log-aggregation-enable to true in yarn-site.xml they for about configure.sh. Running configure.sh is important when installing or changing the configuration of your cluster. It does a handful of things but most importantly it reads xml configuration file changes. There is a log kept for configure.sh under MAPR_HOME/logs/configure.log showing what has been run and when. The other problem seen is that users forget to pass -“HS” to configure.sh. This lets MapR know where the history server is running and on what port. Without which you will likely get error messages similar to the one above. Once completed you should be able to use the History server web interface to drill into logs OR you can use the yarn logs command line.

The configure script is a small MapR nuance but is critical for proper cluster configuration.  Any time you add or remove software or reconfigure things make sure you take a look at configure.sh. You will see it across all the docs for the installation of Hadoop ecosystem products in MapR.