It all started with, “Hey Anthony, since we’re launching in a few weeks we should probably make the Bright Score infrastructure scalable. We might get a few users.”
Anthony, the CTO, wasn’t worried: “No problem — we’ll just use Rightscale. They recently released support for Softlayer (our hosting provider). Rightscale also supports Chef, so we can just import our existing Chef scripts and be done.”
This is our ensuing tale of failures and triumphs.
The Bright Score at its simplest takes a job description and a resume, and calculates a score. We needed to scale that up to millions of resumes, and millions of job descriptions, while maintaining low latency. Big data today usually refers to Hadoop, but our data is a variant of a bipartite graph problem, so Hadoop does not work (think of the Bright Score as an edge between a job node and and person node). We have more of a traditional batch analytics problem. Since our tasks are largely CPU limited (resumes and job descriptions are rather sparse), the application is nearly horizontally scalable by adding more nodes.
Rightscale is built for such problems. You create a Server template that bootstraps nodes using either RightScripts (bash scripts) or Ruby based Chef scripts. Coincidentally, the Bright Score project used Chef as its configuration management tool. Rightscale upsizes or downsizes your Server Array (clusters of VM instances that perform common tasks) based on various factors, in our case the CPU load of the servers. Nodes can vote to grow or shrink the Server Array cluster based on user defined criteria (i.e. CPU usage).
Although this migration eventually worked, we learned a couple of lessons that we’re documenting here for those who follow in our footsteps:
- Rightscale Chef works well with a large number of operating systems and hosting providers, as long as they are Ubuntu and Amazon Web Services. The nodes that we wanted to clone were Fedora 16. While building the Bright Score, we tried various flavors of CentOS and Fedora, but for the particular combination of packages we preferred, Fedora 16 was the only OS that worked seamlessly. We were running Fedora 16 at Softlayer so surely it should have been trivial to spin up a Fedora 16 node using Rightscale and manage it using Chef … wrong. The Rightscale/Chef combination only works if you have a specially built “RightImage” – a server image with special packages that enable it to communicate with RightScale. The RightImage contains the operating system as well as Ruby and a dated version of Chef solo (version 0.09 of Chef, which is deprecated). The available RightImages at Softlayer were both limited and rather old (Ubuntu 10.04 and CentOS5.5), and did not support our stack (which is known to work on Ubuntu 11.10 or later or Fedora 15 or later). We tried to build a Fedora 16 RightImage, but after a week abandoned this and moved our efforts to Amazon. At Amazon, we were greeted with hundreds of available RightImages. We were up and running within hours. This is a classic chicken and egg problem. So many people are using RightScale/AWS that it works better there.
- Rightscale Chef is only vaguely related to Opscode/Open Source Chef. The scripts look the same, but the way they are used is fundamentally different. The way we were instructed by the Jedi Chef Master involved using Knife to upload chef scripts to the chef server and then, using Knife, bootstrap a new node with a run list and desired environment from the chef server. The last recipe would add a cron job that runs chef-client at regular intervals. Rightscale Chef gets rid of many fundamental concepts that we utilize, like environments, databags and attributes. Here are the key differences:
- Chef recipes can only be loaded into Rightscale via github, as far as we can tell. You hook up your Github repository to RightScale, click a button, and all your cookbooks are imported.
- Recipes are only run automatically at boot time.
- The “Run List” is created manually using the web site (this is rather tedious).
- Recipes can be run individually at later times using the dashboard.
- There is no possibility of asking RightScale nodes to “phone home” by calling chef-client.
- Knife is not needed except to create the metadata file.
- Conversely, we discovered after a day or two of puzzlement that the metadata.json file needs to be there and needs to be correct. Each sub-recipe needs to be specified.
- Chef attributes are replaced by Rightscale inputs that can be set in the browser. This introduces a dependency in the chef recipe on Rightscale specific recipes, which means that recipes using attributes NEED TO BE REWRITTEN FOR RIGHTSCALE.
- We were forced to move our Databag items into RightScale inputs.
If none of this is news to you, you need to come work for us. These are things that happen when a Scientist (Bright Score 33 for that position) and CTO try to do a devops’ job.
David Hardtke and Anthony Duerr