The Hortonworks Sandbox has been out for some time now and over that time a few versions have been released. I use the sandbox for demos, POCs and development work. In the interest of always using the latest and greatest I have had the opportunity to install a few new versions. A pattern has emerged for me of changes and things that I do with the sandbox when a new version comes out. I thought sharing this list might be useful to others:
1. Snapshot it – I run a Mac Pro so I use VMware Fusion. One of the first things I do is snapshot my installation right after I install. I also make sure I take snapshots at key breakpoints over the working life of the instance including after installing packages and bookending projects. I like to be able to make all the changes I would like while experimenting with the sandbox and simply backing out of those changes quickly.
2. Modify the Network – The sandbox used to come preconfigured with two network adapters which was later reduced to one. Of course its up to you how you accomplish connecting the sandbox in general but I have found I like having my primary adapter to be on a host only network while adding a second adapter for external connections. This allows me to not only update the sandbox with new tutorials but also connect to various online repos and pull packages down via the command line as I see fit. I like to choose a static IP for my sandbox and make an entry in my local hosts file for name resolution that is consistent over time.
3. Install additional packages – I usually have a list of thing I like to use and install. This list might vary for you but some of my favorites include mlocate, R along with the R Hadoop Libraries, Mahout, flume, elastic search, Kibana. If you have other linux nodes you want to harness as a group I also recommend pdsh.
4. Swap SSH Keys – I dont like to keep typing passwords so I always make sure I create keys for user root and swap them with my host OS public key. Be sure to also change the default root password from “hadoop”.
5. Enable Ambari – There is a small script in home directory of root that you can run to enable Amabri called “start_ambari.sh”. Run this script and reboot. You will then have Ambari available at http://hostname:8080 with username admin and password admin while the standard Hue interface is available at http://hostname.
6. Check for new Tutorials – You can click the “About Hortonwork Hue” icon in the upper left hand corner and then click “Update” on the resulting page. You can check this every so often to make sure you have all the latest tutorials.
7. Enable NFS to HDFS – This is a little more involved but is possible. I will have a blog entry on the Hortonworks main site detailing the steps involved and I will like to it here. This gives you the ability to mount HDFS as an NFS mountable directory to your local workstation. This isnt really made for a transferring data at scale but is another very hand option up to a point.
8. Increase the amount of available memory – This is a no brainer. Turn up the amount of memory available to the sandbox to make your life easier. I have 16GB on my laptop so I have plenty to spare. If you don’t then try to find out if you can host this virtual somewhere with more memory available if possible. Lots of times administrators don’t mind giving you space to run a small VM like this. Try running the built in Hadoop benchmarks as you increase the hardware specs and see what happens.
9. Change the Ambari admin password – The default Ambari login is username Ambari with password set to Ambari. Make sure you change this immediately.
10. Add users including HDFS users – Its Linux so you can simply use “adduser” to add OS level users. Also add HDFS users and add a quotas. You can then simply use hadoop-create-user.sh to add your hadoop users.
11. Connect Clients – I run a collection of clients including Talend Open Studio for Big Data, Tableau and Microsoft Excel powered by the Hortonworks ODBC driver. All these are pretty detailed and probably worthy of additional blog entries on each.
There are probably more things I am forgetting here but this is a good list of the basics that I touch when installing a new version of the Hortonworks Sandbox.