At Bright, we were addicted to AWS spot instances.
Step 1: We admitted we were powerless over our addiction — that our lives had become unmanageable.
For many of our dynamic workloads, we utilize AWS autoscaling groups to allocate computing resources. Example tasks that need to scale include resume parsing, job parsing, and our Bright Score calculations. For these autoscaling groups, we have been requesting spot instances, with bids above typical market pricing. We also utilize spot instances rather than reserved instances for some of our EMR (Elastic Map-Reduce) jobs.
This has worked well for over a year because for many ec2 instance types the spot market pricing has been remarkably stable. For smaller (low memory) instance types, the market price has always fluctuated, but about a year ago we noticed that for larger (high memory) instance types, the spot instance pricing is remarkably stable and that spot instances were functionally equivalent to on-demand instances, but at a fraction of the cost.
We stumbled upon this realization when one of our data scientists started a data processing task that caused the entire office computing infrastructure to become unusable. He had a task that required lots of disk space and lots of memory, and utilized a virtualized node on a shared server. After shrieks of rage from our annoyed developers who were accessing the same physical server, we quickly bid on the largest possible spot instance (m2.4xlarge with 64 GB of memory) and moved his analysis tasks there. In Northern California at the time, the price for this server was $0.23 per hour in the spot market. At the time, the on-demand price for these instances was over $2 (I can’t remember the exact price), so utilizing spot instances gave us tenfold savings.
We started referring to this node as the SUPERNODE, and it was super in one respect — it never went away. Spot nodes can be taken away at any time, but this particular spot instance stayed up for almost one year with a $0.50 bid. The spot price for m2.4xlarge instances was a metronomic $0.23 per hour for something like 9 months, so our bid was always above market price.
And then, on August 26, the SUPERNODE died:
The bid price spiked to $2. By that time, oddly, the on-demand pricing for that instance type was $1.84. The spot instance market price was above the on-demand price, an economically irrational outcome. A spot instance is equivalent to an on-demand instance but with an additional risk of the node disappearing. There should be a price for risk mitigation — on-demand pricing should be greater than spot pricing. The AWS spot instance market is essentially a second-item auction where there are some number of nodes up for bid and everyone with bids above a clearing price gets nodes allocated at the clearing price. Trained economists tell me this is among the most efficient types of auction. Economists, however, love rational market participants. Recently, in some AWS spot instance markets (a market is a particular instance type and availability zone), there are large numbers of irrational actors (or as we refer to them privately, idiots). People are bidding at hourly rates far greater than on-demand pricing for the same ec2 nodes. These bids are rational if you are guaranteed to be the only idiot in the market, but, as one can painfully learn, 2 competing idiots ruin this strategy. In Northern Virginia this week, for instance, someone is bidding $2.50/hour each day starting at 10AM GMT for c1.medium instances in one of the availability zones. This is 20 times the on-demand price for this instance ($.12/hour).
This seems to be a recent pattern in many spot instance markets. The m2.2xlarge market in Northern Virginia destabilized in August. On September 23, somebody started bidding $20/hour for m2.4xlarge instances.
Our sysops people have decided that depending on the rationality of AWS spot instance market is too risky and have moved some of our work onto on-demand, reserved, and other instance types. This, of course, adds cost and operational complexity.
Dear idiots — you are ruining a good thing for everyone.