Hadoop Client Setup for Azure Based Edge Node accessing ADLS

In my work I have the need to try unusual things. Today I needed to setup a hadoop client aka “edge node” that was capable of of access ADLS in Azure. While I’m sure this is documented somewhere here is what did.

Repo Config

In order to obtain the packages I needed that would allow me to access ADLS I went to Hortonworks docs. Based upon the fact that HDInsights 3.6 maps to HDP 2.6 I was fairly sure the repo link for Hortonworks was the way to go. Today I used Ubuntu 16.04 so I grabbed the appropriate link for the HDP repo file.  I then was able to follow the appropriate instructions in HDP docs for Ubuntu.

Next its important to Configure the correct core-site.xml. The version that comes from just installing hadoop-client is essentially blank and the version that is required to access ADLS is special. Here is a copy my core-site with critical pieces redacted. You also need to generate a service principal and add it to your Azure account in order to complete these values.

Once that is in place you should be able to use “adl:///” URI against your Azure ADLS at the command line. The nice part about ADLS is that is more Hadoop like than your average Azure BLOB store.

You can now use this by hand and in scripts as needed. You can also access ADLS programmatically via REST API.