Better Utilize Infrastructure and Reduce Costs With Multiple Instance Types on AWS

Organizations have been looking at moving their applications and infrastructure to the cloud for years and cost is usually a burden and a barrier. Cloud providers like AWS provide a variety of instance families that vary in price according to their attributes such as the M, C, R or T family. For a quick recap, there are instances that range in RAM, CPU, and disk resources and some that feature powerful video cards and Solid State Drives. The instances are available to you with different purchasing options such as on-demand, reserved, and Spot.

When seeking ways to reduce Cloud spend, a lot of companies end up going with low-effort solutions that focus on reporting and notifications, for fear of disrupting production environments.  Leveraging Spot Instances can provide up to 90% savings compared to on-demand instance prices, but since they involve a certain level of risk and complexity, some companies choose to steer clear (and we can’t really blame them, you have to protect your apps) Spot instances can be that holy grail of cloud savings though, giving you the availability you need for a fraction of the cost, but you have to do it right. The rapidly growing Demandbase is a great example of a company that knows how to leverage Spot without compromise.

An essential part of using Spot Instances is diversifying workloads across multiple or “Multiple” Instance Types. This concept is Crucial to maximize savings and most importantly, availability and continuity when using Spot Instances. In this post, I will go over the challenges of using Multiple Instance Type clusters in production as part of a Digital Application (Web, Mobile & API) and explain, based on our experience, what can be done to solve it.

Challenges with Multiple Instance Type Clusters

Although using clusters with Multiple Instance Types can create better cost efficiencies, they come with some complexity and scaling challenges. The general fear of using Spot instances is driven by resource instability and price.  The cloud provider can terminate the instances with only a short warning. The complexity increases when decisions are required regarding when to choose among on-demand, Spot, and reserved instances. Intelligent and automatic decision-making processes are needed to select which option or combination of options is best at any point in time.

Cloud providers usually have an auto-scaling feature that can scale up instances based on different metrics such as CPU and network utilization. This sounds great in theory but when you are using Multiple Instance Types, you will be faced with a new problem because each instance is using different resources.

Let’s take a look at using Mixed Instance Types on AWS as an example:

Using Multiple Instance Types results in diverse hardware

  • A C3 large instance has 2 CPU’s with 3.75GB of RAM
  • A C3 extra-large instance has 4 CPUs with 7.5 GB of RAM.

Utilization is different for each Multiple Instance Type

  • 50% utilization of 2 CPU’s is completely different from a machine with double the CPU count.
  • The Memory utilization percentage will also differ across Multiple Instance Types.
  • Especially when both of the instances are running behind the same Load Balancer and serving the same amount of requests per second.

Target Tracking Scaling Policies

  • If you have a target tracking scaling policy in place to keep the average aggregate CPU utilization at 50 percent, the utilization value will be different across different CPUs and the number of cores.
  • Users will be faced with the challenge of instances being scaled up or down prematurely based on inconsistent hardware metrics.
  • One way to manage auto-scaling Multiple Instance Types is to create an additional Auto Scaling Group but this just adds additional complexity and management.
  • More hardware used means higher costs

Some instances are left underutilized while new ones are spun up, resulting in an inefficient and expensive group.

There has to be a better way.

The Spotinst Solution

Our approach is to leverage excess cloud capacity to reduce infrastructure compute costs by 80% or more by managing the lifecycle of Spot Instances with Elastigroup. Elastigroup reduces costs by using machine-learning to analyze the constantly fluctuating Spot market. Each Spot market will receive a score which we call Spot Market Scoring.

Spot Market Scoring is a unique feature that helps you choose the best spot markets. The scoring section provides a visual aid showing the number of separate spot markets available based on the number of Availability Zones and Spot Types selected. The scale goes from 0-100 where 0 is a non-functional market and 100 will provide the best price and longevity for the spot instance. When using Multiple Instance Types along with multiple availability zones, you will be presented with more saving possibility in the Spot market. With the intro to the Spot Market out of the way, let’s explore how it works technically.

Elastigroup will detect an unstable Spot Market based on patterns in past interruptions and real-time market analytics. Elastigroup will act to balance capacity up to 15 minutes ahead of time to ensure 100% availability. If other Spot Markets aren’t available, Elastigroup will fallback the Spot instance to an On-Demand instance. The old Spot instance will be drained and terminated.

Elastigroup also has target scaling policies that can help manage and scale Multiple Type Instances. They are basically autoscaling policies that track a specified metric relative to the desired target value and automatically adjust an Elastigroup to meet this target. Here are the metrics available for target scaling policies:

  • Average CPU Utilization
  • Average Network In
  • Average Network Out

Just assign a metric that best describes the load of your application, and set a target value. From here, Elastigroup takes over and manages the capacity for you. Elastigroup will dynamically create scaling policies for you in real-time to ensure that your target metric stays at (or close to) the specified target value. Instances will be scaled up when demand is high and unused resources are released when demand is low. Keeping your cluster performing optimally reduces your compute cost, which is Spotinst’s main goal.

Conclusion

While using a cluster of Multiple Instance Types can optimize costs and availability, there is still no straightforward way to automate operations from a cloud provider. Spotinst Elastigroup will help manage your workloads on spot instances and significantly reduce costs. With Spot Market Scoring in Elastigroup, you can choose the best spot markets for the workload and see which instances will work best in each availability zone. Elastigroup’s target scaling policies will make your life easier by taking your assigned metrics and manage the instance scaling for you to maintain optimal cluster performance and reduce compute costs.