AWS continues its rapid growth with its amazing capabilities and scale; however its cost is still one of the major adoption challenges for enterprises today. In particular when dealing with EC2, multiple pricing options exist in terms of size, capacity and pricing models. AWS provides the option to choose instances from multiples flavors (e.g., T2, C3, M4, etc.) – with multiple purchasing options of on-demand, reserved and Spot. One of the important ways to cope with these costs and continually optimize the footprint is with blended clusters composed from a mix of the different purchasing options.
Leveraging Spot Instances can provide up to 90% savings compare to on-demand instance prices according to Amazon. The types of instances where you will see the best savings highly depends on the region. As you can see below, a c1.medium has much greater savings potential in the two ap-southeast regions vs everywhere else.
However, selecting between on-demand, reserved and Spot is still not a straightforward task and involves understanding the impact on your workload availability. For example, when creating a blended on-demand / spot cluster you will need to consider the balancing methodologies and fail over mechanisms to ensure service up time and performance.
Blended Cluster Challenges
Although using blended cloud clusters can create great cost efficiencies, they come with complexity and challenges.
The ‘fear of Spot’ is driven by resource instability. Whenever Spot prices go above the bid price, the Spot Instance will be terminated. To ensure Spot Instance availability, users need to continue to increase bid prices depending upon the Spot price. This could result in a situation where a user pays more for the Spot Instance than for an on-demand instance. In addition, Spot prices differ in regions; and there is no easy capability to select the best price in the best region at any point. These Spot price features create the complexity and uncertainty that in turn makes users not willing to consider the Spot resource – thereby losing a great way to create greater efficiency.
The complexity increases when decisions are required regarding where to ‘put the needle’ among on-demand, Spot, and reserved instances. All three options provide value, but detailed continuous comparisons and intelligent and automatic decision-making processes are needed to select which option or combination of options is best at any point of time.
Three Ways to Mitigate the Complexity
1. Using auto-scaling groups
In this case we use two auto-scaling groups – one for on-demand and another for Spot Instances. These auto-scaling groups can then both be attached to the same ELB. Since the ELB distributes requests in a round-robin fashion, having two auto-scaling groups will ensure that the requests get served by either on-demand or Spot Instances. Each auto-scaling group will have two sets of policies: increase and decrease group size. The policies are based on CPU-Utilization, which also creates CloudWatch alarms. The two groups scale horizontally (in/out) based on the thresholds set.
For example, we increase the on-demand group size in the case of 70% CPU utilization, and 20% when we need to decrease group size. In addition, we set the Spot auto-scaling group to increase on 80% and decrease on 30% CPU utilization. So, whenever scale-out happens, it will put the Spot auto-scaling group higher in priority due to the lower threshold (70%). And in case there are no Spot instances available, the on-demand group will expand accordingly. Similarly, whenever scale-down activities happen, the Spot auto-scaling group will be selected first.
2. Structured Bidding Process
The following steps illustrate a solution. In this case we exemplify a dynamic blended cluster composed from on-demand and Spot Instances. Note that this is just a prototype that should be adjusted and automated based on your own cloud deployment requirements.
- In the first step of this scenario, we identify the availability zone and the Spot prices for that availability zone by parsing the output of “ec2-describe-spot-price-history”.
- Next, we pick the maximum Spot price by executing the “ec2-describe-spot-price-history” command.
- Then we will define our bid price by defining bid_price = spot_price + spot_price%20. So for example, if the Spot price is $0.010, then the bid price will be $0.012.
- Now using the bid price, we create a Spot auto-scaling group and attach it to the ELB (as described in point #1 above).
- The reason behind executing the “ec2-describe-spot-price-history” command continuously is to keep our maximum Spot price up to date. Based on the maximum Spot price, we check a few things:
- If the difference between the bid_price and spot_price is less than 20% (for example, if our maximum spot_price is $0.011 and bid_price is $0.012), then we will define a new bid_price (bid_price= $0.011+ $0.011 %20) and update the auto-scaling group with this new bid price.
- If the difference between the on_demand_price and maximum spot_price is less than 20% (for example, the on_demand price is $0.014 and the spot_price is $0.013), we immediately scale down 50% of our spot-instances and scale up 50% of our on-demand instances.
- As the difference between the on_demand_price and maximum spot_price shrink, let say is less than 10%, we immediately scale down 75% of our spot-instances and spin up same number of on-demand instances.
- If the on-demand price is greater than the bid_price, we set the min_size of auto-scaling group to 0. This ensures that all instances are running on demand, and that we are not paying more for the Spot Instances.
3. Using Amazon Spot Fleet
By creating Spot Fleet for Spot Instances, we can add multiple launch specifications where instance types, AMIs, availability zones or subnets can be varied. Based on the launch specification, the Spot Fleet selects the Spot Instances pool. This procedure mitigates the risk of a sudden Spot price increase by a diversified allocation strategy that fulfills the request equally across multiple Spot Fleets.
While using blended clusters on AWS creates valuable cost optimization, there is still no straightforward way to automate their operations. It is possible to use your own scripts by considering resources usage over time, and by developing intelligent methods to place the needle between the cost you are willing to pay and the environment or specific workload uptime requirements you have.
At Spotinst, we have developed an innovative solution for Amazon customers to bid on unused capacity. Using advanced machine learning, our unique prediction algorithm chooses the most effective EC2 instance option for Amazon customers, ensuring reliability and stability while saving customers an average of 75% on cloud operating costs. The Spotinst platform automates switching between server capacities, and provides cloud capacity without interruptions.