0

How to Delete Indexes in Elasticsearch

This blog covers the different ways to delete indexes in Elasticsearch and the considerations involved, including data loss, partial deletions and cluster performance, using the example of a social networking site application.

Contents


Deleting an index in Elasticsearch means removing an entire set of documents and the structure (schema) associated with them from the Elasticsearch cluster. An index in Elasticsearch is a collection of documents that are related in some way; think of it as a table in a relational database. Each document within an index is a record that contains data in JSON format. When we delete an index, we remove all these documents and the metadata associated with the index, such as mappings (which define fields and data types) and settings (which configure behavior of the index).

In this blog, we’ll cover the different ways that you can delete indexes in Elasticsearch and the considerations involved using the example of a social networking site.

Deleting Indexes for a Social Networking Site

Let’s take an example of a social networking site like Linkedin. As the number of users and posts on the social networking site grows, so do the demands of the Elasticsearch infrastructure. The original configuration of the Elasticsearch index may not meet the current or future demands, requiring an index to be deleted and a new one created. Let’s take a look at two frequent scenarios in Elasticsearch in more depth: running out of shard space and changing replica shard counts.

Running out of Shard Space

Each shard in an Elasticsearch index is essentially a self-contained index. There's a practical limit to the amount of data each shard can store efficiently. If we've underestimated the volume of data and shards reach their capacity limits, we need to create a new index with a higher number of primary shards and migrate data.

Changing Replica Shard Count

In Elasticsearch, the number of replica shards affects both the availability and the search performance of the cluster. As usage grows on the social networking site, read performance suffers so we need to increase the number of replica shards. This change cannot be made to an existing index without a reindexing operation, an operation that deletes the old index and creates a new index.

In the following sections, we’ll describe the different APIs available to delete indexes in Elasticsearch and best practices to ensure cluster performance during reindexing operations.

Delete Index API

The Delete Index API in Elasticsearch allows us to remove an entire index, including all of its documents, mappings, settings, and any associated metadata. This operation is irreversible, meaning once an index is deleted, there is no way to recover it unless you have a snapshot to restore from.

Let’s take the example of the social networking site. When we were a young startup without much usage or adoption, we setup a single replica shard per primary shard to save on costs. As the startup grew over time, we wanted to improve the availability and added two replica shards per primary shard. To make this change, we deleted the existing index and created a new index.

      import requests

      # Elasticsearch server URL
      elasticsearch_url = "http://localhost:9200"

      # Specify the index name you want to delete
      index_name = "posts"

      # Construct the URL for the Delete Index API
      delete_index_url = f"{elasticsearch_url}/{index_name}"

      # Send the DELETE request to delete the index
      response = requests.delete(delete_index_url)

      # Check the response status
      if response.status_code == 200:
        print(f"Index '{index_name}' successfully deleted.")
      else:
        print(f"Failed to delete index. Response code: {response.status_code}, Response content: {response.text}")
    

Delete Alias API

In Elasticsearch, an alias is a way to associate one or more indices with a logical name. We can use aliases to create an abstraction over indexes. This simplifies the management of indices, especially in scenarios where the index structure may change over time.

With the Delete Index API, we would have needed to make a manual change to the application code to ensure that the postsV2 index was used. This manual switching of indexes in the application code is error prone- if we miss the index reference update in any part of the code then the entire platform could be negatively impacted. Using the Delete Alias API ensures that the application always points to the current index for the posts.

      import requests

      # Elasticsearch server URL
      elasticsearch_url = "http://localhost:9200"

      # Index and alias names
      index_name = "posts"
      alias_name = "postsAlias"

      # Create an index
      create_index_url = f"{elasticsearch_url}/{index_name}"
      requests.put(create_index_url)

      # Create an alias for the index
      create_alias_url = f"{elasticsearch_url}/_aliases"
      alias_definition = {
          "actions": [
            {"add": {"index": index_name, "alias": alias_name}}
          ]
      }
      requests.post(create_alias_url, json=alias_definition)

      print(f"Index '{index_name}' created, and alias '{alias_name}' set up.")
    

The code sample above creates an alias “postsAlias” that points to the index “posts”.

Use the following code to delete an index:

      import requests

      # Elasticsearch server URL
      elasticsearch_url = "http://localhost:9200"

      # Alias name to be deleted
      alias_name = "postsAlias"

      # Construct the URL for the Delete Alias API
      delete_alias_url = f"{elasticsearch_url}/_alias/{alias_name}"

      # Send the DELETE request to remove the alias
      response = requests.delete(delete_alias_url)

      # Check the response status
      if response.status_code == 200:
        print(f"Alias '{alias_name}' successfully deleted.")
      else:
        print(f"Failed to delete alias. Response code: {response.status_code}, Response content: {response.text}")
    

Deleting an alias does not delete the underlying index or any data within it. It simply removes the association between the alias and the index. In situations where we want to delete an entire index, we would use the Delete Index API.

Best practices when deleting indexes

Deleting indexes or aliases in Elasticsearch are tricky and should involve careful consideration as these operations are irreversible. When we delete an index, we delete every single document and index configuration. In the next sections, we’ll share best practices to make index deletions in Elasticsearch safer and simpler.

Snapshot API

Backups ensure that we can recover any data, providing a safety net in the case of accidental deletions or other data recovery scenarios. In Elasticsearch, backups can be automated, or periodically scheduled, ensuring that the data is consistently backed up. We can set different frequencies to backups depending on the criticality of the data and the frequency of updates using the create snapshot API.

      PUT /_snapshot/backup_repo/posts_user_snapshot_1710586167
      {
        "indices": "posts,users",
        "ignore_unavailable": true,
        "include_global_state": false
      }
    

After executing the create snapshot API call above, Elasticsearch creates a snapshot named

posts_user_snapshot_171058616
, containing data from indices posts and
users
, and stores it in the repository named
backup_repo
.

Using Aliases

Aliases allow us to use logical or symbolic names that can be associated with one or more physical indices. Instead of referencing specific index names directly in application code, aliases can be used, which remain constant even when underlying indices are deleted or recreated.

Since aliases can be reassigned and swapped easily, they allow you to perform deletion operations without downtime. By swapping aliases from old indexes to new ones, applications can continue to query data seamlessly while the old indexes are deleted.

Index Lifecycle Management Policies

For our social networking site, we are building search across all users, posts, jobs and more. We want to be able to retrieve any job, post, etc. quickly regardless of its age. In this scenario, index lifecycle management policies are not useful.

In contrast, if we wanted to log all of the events on the social networking site then we may want more recent events to result in faster search experiences than older data. In these cases, we can use index lifecycle management (ILM) policies to automate data into hot, warm, cold and delete tiers. This can reduce the need for manual index management tasks.

In addition to storage tiering, ILM policies can be used to define policies for index rollovers and shard splitting based on a specific data size. This can help ensure consistent shard sizes even as the amount of data in an index accumulates over time.

Issues with index deletions

We highlight the potential issues when deleting indexes including the impact on cluster performance, accidental deletions, data loss and partial deletions and how to minimize any issues from occurring.

Cluster Performance

Elasticsearch is a distributed system with data spread across multiple nodes. Index deletion requires coordination across nodes to ensure that all data associated with an index is fully deleted. Depending on the size of the index, index deletion can cause a temporary surge in disk I/O, CPU usage, network traffic and memory consumption. This surge can negatively impact other cluster operations, leading to slower search responses, delayed indexing, or in extreme cases, timeouts. That’s why it’s advised to plan index deletion for periods of low activity to minimize the impact on user experience, monitor cluster performance, or provision additional resources for index deletion operations. While index deletion can impact cluster performance, under the hood Elasticsearch tries to minimize the disruption by deleting the index over a period of time. This means that deleting an index does not immediately free up disk space or other resources.

Security

Deleting an index is not an operation to take lightly. It can impact the performance and availability of the system. That’s why it's advised to use Role-Based Access Controls (RBAC) to limit the access of Elasticsearch users to deletion operations.  

Data Loss

Index deletions in Elasticsearch cause irreversible data loss. It’s advised to employ practices against data loss including versioning indexes, automating backups, using RBAC policies and configuring index aliases.

Partial Deletions

Failed node communication during index deletions may result in only partial deletion of the index and issues of inaccurate analytics and duplicate data. The root cause often being one of the following:

  • network issues such as partitions or disruptions that hinder communication between nodes
  • cluster health issues such as nodes in the ‘red’ or ‘yellow’ state, indicating that not all primary or replica shards are fully operational
  • Security access controls such as configurations and RBAC that prevents nodes from communicating
  • Hardware failures such as the system being overburdened on CPU, memory or disk 1/Os

It can be a time-consuming process to determine the root case of the partial deletion, delete the incomplete data and restart the reindexing process.

Conclusion

As a search application sees usage and data grow over time, it requires indexes to be deleted and recreated, a process known as reindexing in Elasticsearch. There are several tools like aliases, index lifecycle management policies and RBAC that can be used to simplify and automate the reindexing process.

That said, deleting indexes in Elasticsearch are resource-intensive operations for clusters and can result in cluster performance degradation, partial deletions and data loss. If you are frequently going through the motions of index deletions in Elasticsearch, then you may want to explore Rockset as an alternative. Every Rockset index is designed to scale up to hundreds of terabytes without ever needing to reindex the data. You can try Rockset on your own data and queries with a free trial and $300 in credits.