In my last post, I talked about the AWS Elasticsearch Service. In this post, I would like to discuss how to manage retention for the Elasticsearch Service. Read on to learn more!
Retention
AWS Elasticsearch Service makes it really easy to stand up an elasticsearch cluster fronted by Kibana. It does not offer in the UI (neither AWS console nor Kibana UI) is the ability to manage the retention of storage. By default, elasticsearch will consume data until it runs out of space. Unless you properly configured elasticsearch from day one, you will eventually experience the issue where logs are no longer being ingested due to disk space issues.
To handle ingestion, you get to deal with the elasticsearch API. The good news is that the process is pretty easy. For example, to get a list of all the indices, when they were created, and how much space they are consuming, you can run:
$ > curl -X GET "${VPC_ENDPOINT}/_cat/indices?v" | sort
Then you can delete the indices you no longer want with:
$ > for I in $(curl -X GET "${VPC_ENDPOINT}/_cat/indices?v" | grep ${DATE_TO_ERASE} | awk '{ print $3 }'); do \
curl -X DELETE "${VPC_ENDPOINT}/$I"; \
done
Lambda
While the above approach works to delete old indices manually, you will likely want an automated process to handle this. If you already have a process for running code on a schedule, you can leverage that; alternatively, you could use the AWS Lambda service. As it turns out, AWS has documentation on addressing retention on the Elasticsearch Service via Lambda.
Summary
While it is easy to get started with the AWS Elasticsearch Service, you should know that not everything is provided out of the box. In general, an automated approach to retention should be implemented with every Elasticsearch Service domain. Code examples of how to achieve this are available, and if desired, you can also leverage AWS Lambda to run the code required to manage retention.
© 2019 – 2021, Steve Flanders. All rights reserved.