Single Node Ceph Cluster Sandbox

Well I had not heard of the Ceph project until recently. I was asked about making some thing work with the Ceph S3API. After looking at these few different sites. I was able to proceed in creating a single node version of what was supposed to be 6 nodes. I wanted to be sure to storeshare the code for this.

After the preparation steps I was into the actual installation of Ceph. I found a few packages were missing both things that I wanted and a few requirements.

now lets get down to business.

Probably one of the most imporant parts was the editing of the ceph.conf as described here. As well as the fact that I missed the “1” in the above “ceph-deploy osd activate” command. You have to look closely. To use the S3 endpoint you have to set up the ceph gateway (Rados server).

once I had done that I simply needed a new user to make a connection the S3 commandline.

I then tested the python program suggested by the docs:

Then S3. Of course you need to install the S3 command line

As you can see most calls to the gateway are supported:

 

Hadoop Client Setup for Azure Based Edge Node accessing ADLS

In my work I have the need to try unusual things. Today I needed to setup a hadoop client aka “edge node” that was capable of of access ADLS in Azure. While I’m sure this is documented somewhere here is what did.

Repo Config

In order to obtain the packages I needed that would allow me to access ADLS I went to Hortonworks docs. Based upon the fact that HDInsights 3.6 maps to HDP 2.6 I was fairly sure the repo link for Hortonworks was the way to go. Today I used Ubuntu 16.04 so I grabbed the appropriate link for the HDP repo file.  I then was able to follow the appropriate instructions in HDP docs for Ubuntu.

Next its important to Configure the correct core-site.xml. The version that comes from just installing hadoop-client is essentially blank and the version that is required to access ADLS is special. Here is a copy my core-site with critical pieces redacted. You also need to generate a service principal and add it to your Azure account in order to complete these values.

Once that is in place you should be able to use “adl:///” URI against your Azure ADLS at the command line. The nice part about ADLS is that is more Hadoop like than your average Azure BLOB store.

You can now use this by hand and in scripts as needed. You can also access ADLS programmatically via REST API.

Automated Ambari Install

I routinely have need for a single node install of HDP for integration work. Of course I could script all of that by hand and not use a cluster manager but what would I have to blog about. I routinely prep a single instance in EC2 with a small script and then punch through the GUI setup by hand. In the interest of time and as a part of larger set of partial automation projects I am working on, I decided it would be nice to get a single node install working as hands free as possible. Here we go:

The Setup

This should get you to a clean install of Ambari with no actual Hadoop packages installed.

Blueprint Registration

The balance of this code installs the actual Hadoop packages:

Now in order to run the last section a blueprint file is needed. I created this simply by installing via the Ambari GUI to get everything how I wanted it and then dumping the blueprint to JSON to allow me to replicate it.

I did edit this file slightly by hand such as reducing the hdfs replication factor. The file that it produced was FAR more detailed than anything I found here or here.

You wont see any real output when you register the blueprint. There is a process of topology validation that can be disable though. You can check however to make sure it was accepted. I suggest you do this if you are still testing the blueprint as it will fail silently if there are errors.

Cluster Install

For the next step you need a file that maps hosts to components which is fairly simple for a single node. You need a hostmapping file.

I typically add a single sed to swap out the fqdn for the internal AWS hostname in my script.

Then submit using the following to kick off the cluster install:

When you are successful there will be some output like this:

Monitoring Progress

At this point you can for sure monitor the progress via RESTful calls like this:

But its far easier at this point to just open the web interface of Ambari on port 8080. You will see your services installing and all the pretty lights turn green letting you know things are installed and ready to use.

Ambari Services Installing

Troubleshooting etc.

PENDING HOST ASSIGNMENT

While I played with this I did see an issue where it seemed like things launched but when I checked the Ambari GUI I saw a message in the operations dialog that list the problem as “PENDING HOST ASSIGNMENT” which apparently means Ambari-agent was not online and or “registered”. I only saw this once and upon trying a clean new install test I didn’t see it again.

RESET Default password

Don’t be the person who runs Ambari on Amazon with the default password. Just don’t do it.

All these steps together should allow you to put together a nice script for launching a single node test cluster of your own. There are instructions for adding nodes to your cluster or starting with a more complex setup in hostmapping available in the Ambari docs. Depending upon your needs you could even capture an AMI once you are setup allowing you to launch a single version even faster. The nice part about this script is that you always get the newest HDP release.