Hadoop Encryption at Rest

At Rest, as in not motion, not REST as in web services. For a long time the only real answer Hadoop had for encryption at rest was to leverage a third-party tool or consider the use of LUKS for whole disk encryption. What I see customers asking for these days is really encryption in motion (aka wire encryption), encryption at rest (at a HDFS layer) plus policies that will eliminate data from specific directories in HDFS based upon some business rule. The good news is that as of Hadoop 2.6 we now have HDFS-6134 in play so there is light at the end of the security tunnel.

The implementation of this new transparent encryption is supported via the normal Hadoop Filesystem Java API, the libhdfs C API and WebHDFS (REST) API. The great news is that once it is set up normal HDFS ACL control access to reading and writing so while there is some administration upfront from a user perspective there is not a terribly large new burden. This essentially means that third-party integration work should be largely left intact.

There is now a Key Management Server (KMS) used to create keys for the encryption process of “encryption zones” also know as directories in HDFS.

So how does all this happen? The design doc describes both the read and write action processes. Illustrated here is the read process:

HDFSEcryptRead

So how does one functionally use encryption zones. Cloudera has a great docs page talking about how to create encryption zones based upon the technology used (over hdfs).

Like most newly invented technology the design doc also calls out some potential issues with the design. While there are some potential vulnerabilities called out in the spec I would still say this is a massive step in the right direction. I also noted this first version really only uses AES-CTR.

It might seem like this is small matter but in the larger context of a security discussion native encryption at rest is an important part of the Hadoop puzzle.