Hadoop Audit Logging

A recent interesting list of customer question included a query about audit logging in Hadoop. Specifically the logging of actions in Hive such as create table actions and queries. Audit was a feature added to Hadoop some time ago. Several JIRAs addressed it including HIVE-3505 and HIVE-1948 but wasnt really addressed in terms of documentation until recently via HIVE-5988.

The enablement of detail logging is done via log4j settings in /etc/hadoop/conf/log4j.properties. Changing from WARN to INFO enables the logging.

Screen Shot 2014-01-27 at 10.06.32 AM

In Hortonwork Data Platform this occurs around line 106 in the provided log4j file. The output of which places notes around actions that happen in HDFS. Since Hive also uses HDFS actions are logged in the /var/log/hadoop/hdfs/hdfs-audit.log

Screen Shot 2014-01-27 at 10.06.53 AM

Actions such as create table, show database etc. are listed for audit logging. Parsing of this log can yield detailed reports on how data was access in Hadoop for technologies that rely on HDFS.