How To Run Your Elasticsearch Cluster on Spot Instances

Tenpa Kunga
ebs
elasticsearch
spot

Elasticsearch is a powerful distributed search and analytics engine designed for scalability, reliability, and ease of management. When running Elasticsearch it can be easy for costs to escalate when you consider amount of processing and memory that is required for an Elasticsearch node. Let’s walk through how you can run your nodes safely on EC2 Spot instances using the Spotinst Elastigroup service. For this tutorial I will be using our new Hot EBS Migration feature. Hot EBS Migration will allow you to create a pool of EBS volumes that are dynamically attached to instances in your Elastigroup. Hot EBS Migration supports both Spot and On Demand instances and is available in multiple availability zones.

I will be using the Elastic.co blog post on configuring an Elasticsearch cluster on AWS for reference: https://www.elastic.co/blog/running-elasticsearch-on-aws

Master Node

  1. In your EC2 console provision a new instance using the most recent Amazon Linux AMI (write down the AMI ID for later).
    image-040
  2. Choose M4.2xlarge as the instance type
  3. For instance details I recommend using AWS SSM for instance management. If you do not have an IAM role for SSM you can easily create one following this guide. http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/systems-manager.htmlimage-041
  4. Click on advanced details below and enter the userdata script as you can see below. This script will install the SSM agent (optional), mount your ebs volume and install / configure the Elasticsearch package.
    #!/bin/bash
    cd /tmp
    curl https://amazon-ssm-us-west-2.s3.amazonaws.com/latest/linux_amd64/amazon-ssm-agent.rpm -o amazon-ssm-agent.rpm
    yum install -y amazon-ssm-agent.rpm
    sudo mkdir /media/elasticsearchvolume
    
    # Determine instance id and instance lifecycle via aws cli command
    INSTANCEID="$(curl http://169.254.169.254/latest/meta-data/instance-id)"
    export AWS_DEFAULT_REGION=us-west-2
    RACK="$(aws ec2 describe-instances --instance-ids $INSTANCEID --query 'Reservations[0].Instances[*].[InstanceLifecycle]' --output text)"
    sudo mkfs -t /dev/xvdb
    sudo mount /dev/xvdb /media/elasticsearchvolume/
    sudo sh -c "echo '/dev/xvdb /media/elasticsearchvolume ext4 defaults,nofail 0 0' >> /etc/fstab"
    sudo rpm -i https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/rpm/elasticsearch/2.3.3/elasticsearch-2.3.3.rpm
    sleep 2
    sudo chown elasticsearch: /media/elasticsearchvolume
    sudo chkconfig --add elasticsearch
    sleep 2
    cd /usr/share/elasticsearch/
    yes | sudo bin/plugin install cloud-aws
    sleep 2
    sudo sh -c "echo 'ES_HEAP_SIZE=10g' >> /etc/sysconfig/elasticsearch"
    sudo sh -c "echo 'MAX_LOCKED_MEMORY=unlimited' >> /etc/sysconfig/elasticsearch"
    PRIVATEIP="$(curl http://instance-data/latest/meta-data/local-ipv4)"
    sudo sh -c "echo 'cluster.name : esonaws' >> /etc/elasticsearch/elasticsearch.yml"
    sudo sh -c "echo 'bootstrap.mlockall : true' >> /etc/elasticsearch/elasticsearch.yml"
    
    sleep 2
    sudo sh -c "echo 'discovery.zen.ping.unicast.hosts : [\""$PRIVATEIP"\"]' >> /etc/elasticsearch/elasticsearch.yml"
    sudo sh -c "echo 'network.host : [\"127.0.0.1\",\""$PRIVATEIP"\"]' >> /etc/elasticsearch/elasticsearch.yml"
    sudo sh -c "echo 'path.data : /media/elasticsearchvolume' >> /etc/elasticsearch/elasticsearch.yml"
    sudo sh -c "echo 'node.rack_id : "$RACK"' >> /etc/elasticsearch/elasticsearch.yml"
    sudo sh -c "echo 'cluster.routing.allocation.awareness.attributes: rack_id' >> /etc/elasticsearch/elasticsearch.yml"
    sudo chown elasticsearch: /media/elasticsearchvolume
    sudo service elasticsearch start
    Sleep 2
    
    
  5. Add a new volume for your data. I used /dev/sdb and a 23GB GP2 volume but you can customize this as you see fit. image-043
  6. Add a descriptive Name tag for your master node.
  7. Create a new Security group that will allow TCP 9200 and TCP 9300 for internal traffic and SSH for external management.
    image-044
  8. Launch your instance and SSH into the instance once it is up and running.
  9. Use curl to make an API request to check the status of you new master. You should see a status of “green” as you can see below. Refer to the Elasticsearch blog post for more information here: https://www.elastic.co/blog/running-elasticsearch-on-aws
    image-045
  10. Make note of the private ip of the master you just created for use later when creating our Elasticsearch nodes.

Shard Allocation Awareness

If Elasticsearch is aware of the physical configuration of your servers, it can ensure that the primary shard and its replica shards are spread across different physical servers, racks, or zones, to minimize the risk of losing all shard copies at the same time. In our case, we will need to define 2 logical racks: One will be the On-Demand and the other will be the Spot. If you look closely at the user data script you will see that we are using the AWS CLI to determine this information. We are also updating the elasticsearch.yml configuration file on each server accordingly. It is recommended that you change local configurations like the rack id via the elasticserach.yml file and cluster configurations via api.

When an instance is launched, our start-up script is aware of the instance life-cycle (either Spot or On-Demand), and will assign the rack id accordingly. 

Hot EBS Migration

Before we jump into creating our Elasticsearch nodes cluster, lets first create the EBS volumes that we’ll be using for data storage.

  1. Open the EC2 console and click on “Volumes”
  2. Let’s create four GP2 volumes. Pick the size that you like in GiB and click create
    image-046
  3. Copy the volume IDs of the volumes you just created for later into a text editor.

 

Creating the Elasticsearch Cluster in an Elastigroup

  1. Open the Spotinst console and browse to Elastigroups. Click on the “Create” button to start the wizard.
  2. Enter a descriptive name for your Elasticsearch Cluster and choose the same region as the master node that you created earlier.
  3. Change the cluster to 50% Spot, select 100 seconds as your draining timeout, and set your target/min/max to 3.
    image-050
  4. On the Compute page, select the same VPC and AMI id that you used for your master.
    image-051
  5. Select three availability zones and select m3.2xlarge, m4.2xlarge, and m4.4xlarge. The market scoring should change as below (scoring will vary).
    image-052
  6. If you are using SSM make sure to use the SSM role you created earlier. You can use the same key pair as your master if you like here.
  7. Under Hot EBS Migration, copy and paste the EBS volume ids that you created earlier. By pressing tab after pasting each id the ID should become selected as you can see below.
    image-053
  8. Copy and paste the following user data script into the user data section below. This script will install Elasticsearch, mount your EBS data volumes, and configure the Elasticsearch service. Be sure to update the private ip of the master as noted below. Note, be sure to update your Elastigroup configuration and remove the line that formats the data volume. A roll is not required after updating the Elastigroup.
  9. #!/bin/bash
    cd /tmp
    curl https://amazon-ssm-us-west-2.s3.amazonaws.com/latest/linux_amd64/amazon-ssm-agent.rpm -o amazon-ssm-agent.rpm
    yum install -y amazon-ssm-agent.rpm
    sudo mkdir /media/elasticsearchvolume
    
    # Determine instance id and instance lifecycle via aws cli command
    INSTANCEID="$(curl http://169.254.169.254/latest/meta-data/instance-id)"
    export AWS_DEFAULT_REGION=us-west-2
    RACK="$(aws ec2 describe-instances --instance-ids $INSTANCEID --query 'Reservations[0].Instances[*].[InstanceLifecycle]' --output text)"
    
    #BE SURE TO REMOVE THE LINE BELOW AFTER INITIAL DEPLOYMENT TO ENSURE DATA VOLUME IS NOT WIPED
    sudo mkfs -t /dev/xvdb
    sudo mount /dev/xvdb /media/elasticsearchvolume/
    sudo sh -c "echo '/dev/xvdb /media/elasticsearchvolume ext4 defaults,nofail 0 0' >> /etc/fstab"
    sudo rpm -i https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/rpm/elasticsearch/2.3.3/elasticsearch-2.3.3.rpm
    sleep 2
    sudo chown elasticsearch: /media/elasticsearchvolume
    sudo chkconfig --add elasticsearch
    sleep 2
    cd /usr/share/elasticsearch/
    yes | sudo bin/plugin install cloud-aws
    sleep 2
    sudo sh -c "echo 'ES_HEAP_SIZE=10g' >> /etc/sysconfig/elasticsearch"
    sudo sh -c "echo 'MAX_LOCKED_MEMORY=unlimited' >> /etc/sysconfig/elasticsearch"
    PRIVATEIP="$(curl http://instance-data/latest/meta-data/local-ipv4)"
    sudo sh -c "echo 'cluster.name : esonaws' >> /etc/elasticsearch/elasticsearch.yml"
    sudo sh -c "echo 'bootstrap.mlockall : true' >> /etc/elasticsearch/elasticsearch.yml"
    sleep 2
    
    # You will need to type in the private ip of your Master node below
    sudo sh -c "echo 'discovery.zen.ping.unicast.hosts : [\"IP ADDRESS OF MASTER NODE ABOVE\"]' >> /etc/elasticsearch/elasticsearch.yml"
    sudo sh -c "echo 'network.host : [\"127.0.0.1\",\""$PRIVATEIP"\"]' >> /etc/elasticsearch/elasticsearch.yml"
    sudo sh -c "echo 'path.data : /media/elasticsearchvolume' >> /etc/elasticsearch/elasticsearch.yml"
    sudo sh -c "echo 'node.rack_id : "$RACK"' >> /etc/elasticsearch/elasticsearch.yml"
    sudo sh -c "echo 'cluster.routing.allocation.awareness.attributes: rack_id' >> /etc/elasticsearch/elasticsearch.yml"
    sudo chown elasticsearch: /media/elasticsearchvolume
    sudo service elasticsearch start
    Sleep 2
    
    
  10. Add a name tag for you instances in the tag section to the right.
    image-055
  11. At the bottom right, click on next to go to the scaling page.
  12. Since we are not using scaling policies here, lets click next again and review the json output. Click on the Create button on the bottom right when ready.

Running Elasticsearch

Now that we have installed and configured everything let’s make sure Elasticsearch is up and running and our new nodes are healthy. Let’s run the same API request as we did earlier to check the status of our cluster. You should now see four data nodes (one master and three nodes launched on Elastigroup).
image-060

As you can see we do not yet have anything in our Elasticsearch data since we do not have any shards. Let’s load a sample dataset which will bring our document count to 1000.

wget https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json?raw=trueasdf -O accounts.json
curl -XPOST 'localhost:9200/bank/account/_bulk?pretty&refresh' --data-binary "@accounts.json"
curl 'localhost:9200/_cat/indices?v'

image-061

Great, now that we have some documents, let’s check on the number of shards again.

image-063

Great, we now have some data loaded and we have 5 primary shards and 5 replicas bringing the total count to 10 active shards. Since we have four data nodes let’s add an additional replica for our “bank” index via API.

curl -XPUT 'localhost:9200/bank/_settings' -d'
{
  "number_of_replicas": 2
}'

As you can see below, we now have 5 primary shards and ten replicas bringing the total count to 15 active shards.
image-064

We now have a fully redundant Elasticsearch cluster running on a blend of 50% Spot and 50% On Demand instances! In case of any hardware failures or spot interruptions you can rest assured that the Elastigroup will automatically attach the existing EBS volume to the new instances for you automatically.

Failover Testing

Now let’s remove an instance from the cluster to simulate a spot interruption. Go into your Elastigroup configuration from the console and detach one of the Spot instances in the cluster.
image-066

If we run an API call to the cluster we can see that we have lost some of our replica shards due to the spot interruption.

image-067

Now let’s wait for the replacement Spot instance to come live. The startup script that we defined in user data will install and configure the server automatically. Since we are using Hot EBS Migration, the data volume will automatically be attached to the new instance. Once the replacement instance is up and running we can query the API again to see the status of the cluster.

image-068

Now that the replacement spot instance is up and running we can see that our replica nodes are back up and running thanks to our Hot EBS Migration feature and the bootstrap configuration that we created in user data!

Conclusion

We hope you enjoyed this tutorial on how to get started with using Spot instances safely with your Elasticsearch clusters. This is a basic example, but you can easily apply the ideas here and create a much larger and more sophisticated cluster while saving money on your EC2 spend by using the Spotinst service.