Our Co-Author for this article:
FirstOffer is a growing shopping app that helps consumers find the best deals across the web. To get the job done they rely on Elasticsearch as their NoSQL database. For excellent customer experience F1rstOffer uses huge, expensive EC2 instance types such as c3.4xlarge, c3.8xlarge, g2.8xlarge, cc2.8xlarge, and r3.8xlarge. With skyrocketing EC2 spend they wished they could use affordable Spot instances, but could not risk the availability issues that come with Spot pricing. However, that all changed when they started using Spotinst… but first some background.
“Our shoppers need super-fast results all the time. With Spotinst our huge EC2 instances cost us 80% less without any impact on availability.” Tal Shemesh,Founder and CTO of F1rstOffer
Sharding and replication – an extra, but necessary expense
One of the most powerful feature of ElasticSearch is its ability to scale horizontally, in many different ways; routing, sharding, and time / pattern based index creation and query. As indexes might contain more data than a single instance can hold, subdividing an index, commonly referred to as sharding, is a standard feature for NoSQL databases. This allows for scaling and distributed operations resulting in better performance. To ensure high availability, replication is typically employed so when there is some sort of failure in the network/cloud, you don’t lose your shard.
Elasticsearch of course provides these features and F1rstOffer happily uses both. To support this they were running numerous On-Demand instances (e.g.
g2.8xlarge, etc.) and had a large monthly bill. While exploring how to be more cost-efficient and simultaneously maintain high availability, F1rstOffer discovered Spotinst.
Highly available and affordable is no longer a dream
Spotinst enables AWS customers to run any workload on Spot instances with full availability. Once you have configured your EC2 capacity requirements, Spotinst will intelligently distribute your workload among different Spot markets (e.g. different instance types and AZs) so that there will never be two simultaneous market failures.
Spotinst identifies – typically 4-7 minutes in advance – when a Spot instance will be terminated, spinning up replacement instances, whether they be Spot, On-Demand or even available Reserved instances. The beautiful end result for F1rstOffer was highly available EC2, zero data loss and 80% cost reduction saving them a significant amount of their EC2 bill during the last 3 months.
How it works?
Before creating an Elasticsearch cluster – Here are the important things you need to consider
- Replication factor – https://www.elastic.co/guide/en/elasticsearch/guide/current/replica-shards.html
- Shard allocation awareness – https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-awareness.html
- Number of eligible Spot Types (that you select on Spotinst)
- Number of eligible Availability zones
#3 * #4 represents the number of total eligible compute pools (markets) that Elastigroup can spin instances from.
- The ultimate situation is to set the
# of markets = # of elastic-search nodes
- Replication factor should be equal to the maximum number of nodes per market in your elastigroup configuration.
If you want to store more than 500GB of data per node, it can be done, but better to run on an Hybrid ondemand-spot cluster using #2 (Shard awareness)
In your Spotinst console: – Define Elastigroup relevant, large instance types (e.g.
r3.8xlarge, etc.), the more types, the better. – Define your desired AZs and once again, the more, the merrier.
Spotinst’s algorithm now will choose the most durable, non-correlated markets (different AZs, instances types, etc.) to ensure data integrity and sustainability.
An incredibly economical Elasticsearch setup
If you want to take extra precautions against Spot market failure, ElasticSearch has a neat feature called Shared-Allocation-Awareness. You can launch both Spot and On-Demand instances into the Elasticsearch cluster, giving them their own rack_id (e.g. Spot instance will be defined as rack-id=1 and On-Demand as rack_id=2) so that replicas are distributed between Spot and On-Demand. This will force the replication to store at least 1 complete set of data on “On-Demand” nodes.
A final tip
We recommend to install kopf plugin. which is a great tool for managing and performing operations against your elastic search cluster.