Python Script for Zero-Downtime Elasticsearch Reindexing

You need to migrate a live search or log index onto a new mapping or shard layout without dropping a single write and without a search outage — this page is the complete elasticsearch-py v8 async driver that does it, checkpointed so it resumes exactly where a crash left off.

Where this fits in the reindex lifecycle

This driver is the executable core of the batch reindex workflow: the parent page describes the migration as a five-state machine (Provision → SubmitReindex → Polling → Validate → AliasSwap), and everything below implements the copy-and-cutover stages of that machine with explicit state persistence. Rather than delegate the whole copy to the server-side _reindex task, this script drives the read/write loop from the client so it can checkpoint after every batch, survive network partitions or an OOM kill mid-run, and re-attach at the exact document it stopped on. The source index stays fully readable and writable throughout — writes keep flowing to a write alias while documents stream to a freshly provisioned target, and the swap to the new index happens in one atomic cluster-state update. This is the pattern to reach for when a plain server-side reindex is too coarse to observe or too fragile to resume.

Because the target is provisioned with an ILM policy attached before the first document lands, the new index starts aging under retention from document one, and its shards honor the hot-warm-cold architecture tier routing the moment they are allocated.

Prerequisites

Elasticsearch 8.x with the source reachable through a write alias, so application writes are never pinned to a concrete index name.
elasticsearch-py v8+ installed (elasticsearch>=8.0,<9.0); the async client AsyncElasticsearch drives the loop.
A target index (or template) created with explicit mappings, number_of_replicas: 0 and refresh_interval: -1 for the copy window, and index.lifecycle.name plus a rollover_alias attached up front.
A service account scoped by RBAC with read on the source and manage on the target pattern.
Writable local disk (or an external key-value store) for the JSON checkpoint file.

Phase 1: Pre-Migration Validation & ILM Alignment

Before initiating data movement, verify that the target index template matches the source schema exactly, including dynamic_templates, keyword vs text splits, and numeric precision. Mapping mismatches during the copy manifest as mapper_parsing_exception or silent field coercion, which corrupts downstream aggregations. Run a targeted dry-run against a single shard to validate field compatibility:

GET /source-index/_search?size=100&filter_path=hits.hits._source

Once schema parity is confirmed, attach the target ILM policy but explicitly set index.lifecycle.rollover_alias so a premature rollover does not fire during the migration window. Verify policy attachment and phase progression before any document moves:

GET /target-index/_ilm/explain

Expected Output:

{
  "indices": {
    "target-index-000001": {
      "index": "target-index-000001",
      "managed": true,
      "policy": "log-retention-90d",
      "lifecycle_date_millis": 1718492100000,
      "phase": "new",
      "action": "complete",
      "step": "complete"
    }
  }
}

If the policy shows phase: "hot" with action: "rollover" prematurely, halt and reset the alias configuration — a target that rolls over mid-copy fragments your checkpoint across two backing indices. Aligning retention windows against migration throughput is covered in Designing Batch Reindex Workflows.

Phase 2: Cluster State Diagnostics & Allocation Control

Align the copy’s batch window with the target cluster’s write capacity. Monitor the write thread pool for rejected counts and throttle when queue depth climbs — the full calibration methodology lives in Optimizing Reindex Thresholds & Bulk Sizes, and the specific batch-size math is in Tuning Bulk Request Size for High-Throughput Reindexing.

Restrict allocation to primaries cluster-wide during the initial load so replica allocation does not compete for I/O while documents stream in (note: cluster.routing.allocation.enable is a global, cluster-level allocation control, not a target-specific or rebalancing setting):

PUT _cluster/settings
{
  "transient": {
    "cluster.routing.allocation.enable": "primaries"
  }
}

Confirm cluster stability before proceeding:

GET _cluster/health?filter_path=status,active_shards,initializing_shards,relocating_shards

Expected Output:

{
  "status": "green",
  "active_shards": 142,
  "initializing_shards": 0,
  "relocating_shards": 0
}

A yellow or red status indicates unassigned shards or active relocation. Do not proceed until status is green and relocating_shards is 0.

Phase 3: Core Execution Engine (Python v8 Async)

The driver below leverages the elasticsearch-py v8 async client, a Point-In-Time with search_after sorting on the _shard_doc tiebreaker for stable deep pagination, and a filesystem-backed JSON checkpoint. The PIT keeps the view consistent across refreshes and enables an exact resume point after a network partition or an OOM kill. Sorting on _id is rejected by default (no fielddata on _id); _shard_doc is the efficient, stable sort designed for PIT plus search_after.

import asyncio
import json
import logging
from pathlib import Path
from typing import Any, Optional

from elasticsearch import AsyncElasticsearch
from elasticsearch import ApiError, ConnectionTimeout

logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
logger = logging.getLogger("reindex_engine")

CHECKPOINT_FILE = Path("reindex_checkpoint.json")
BATCH_SIZE = 5000
MAX_RETRIES = 3


def load_checkpoint() -> Optional[list[Any]]:
    # Resume from the last persisted `sort` values, or None on a fresh run.
    if CHECKPOINT_FILE.exists():
        return json.loads(CHECKPOINT_FILE.read_text())
    return None


def save_checkpoint(sort_values: list[Any]) -> None:
    CHECKPOINT_FILE.write_text(json.dumps(sort_values))


async def fetch_batch(client: AsyncElasticsearch, pit_id: str,
                      sort_values: Optional[list[Any]]) -> list[dict[str, Any]]:
    # No `index=` here: the PIT already pins the target index(es). `_shard_doc`
    # is the stable tiebreaker for deep pagination over a Point-In-Time.
    params: dict[str, Any] = {
        "size": BATCH_SIZE,
        "query": {"match_all": {}},
        "pit": {"id": pit_id, "keep_alive": "5m"},
        "sort": [{"_shard_doc": "asc"}],
    }
    if sort_values:
        params["search_after"] = sort_values
    resp = await client.search(**params, request_timeout=30)
    return resp["hits"]["hits"]


async def bulk_index(client: AsyncElasticsearch, target_index: str,
                     docs: list[dict[str, Any]]) -> int:
    # client.bulk expects an NDJSON-style operations list: an action header
    # followed by its source document, NOT helper-style {"_index", "_source"} dicts.
    operations: list[dict[str, Any]] = []
    for doc in docs:
        # op_type "create" makes a re-run idempotent: it skips ids already written.
        operations.append({"create": {"_index": target_index, "_id": doc["_id"]}})
        operations.append(doc["_source"])

    success = 0
    for attempt in range(MAX_RETRIES):
        try:
            resp = await client.bulk(operations=operations, request_timeout=60)
            items = resp.get("items", [])
            # Each item is {"create": {"status": ..., "error"?: ...}}.
            success += sum(1 for it in items if not next(iter(it.values())).get("error"))
            if resp.get("errors"):
                failed = [it for it in items if next(iter(it.values())).get("error")]
                # A 409 conflict on a re-run is expected (already copied); a mapping
                # error is not — surface it so the run can be halted upstream.
                logger.warning("%d document(s) reported an error in this bulk batch.", len(failed))
            return success
        except (ConnectionTimeout, ApiError) as exc:
            logger.warning("Bulk attempt %d failed: %s", attempt + 1, exc)
            await asyncio.sleep(2 ** attempt)  # exponential backoff
    logger.error("Bulk indexing failed after %d retries.", MAX_RETRIES)
    return success


async def run_reindex(client: AsyncElasticsearch, source_index: str, target_index: str) -> None:
    checkpoint = load_checkpoint()
    processed = 0

    # Open a Point-In-Time so deep pagination stays consistent across refreshes.
    pit = await client.open_point_in_time(index=source_index, keep_alive="5m")
    pit_id = pit["id"]
    try:
        while True:
            batch = await fetch_batch(client, pit_id, checkpoint)
            if not batch:
                logger.info("No more documents. Copy stage complete.")
                break

            await bulk_index(client, target_index, batch)
            processed += len(batch)
            checkpoint = batch[-1]["sort"]  # last hit's sort values = resume cursor
            save_checkpoint(checkpoint)
            logger.info("Checkpoint saved at sort=%s. Total processed: %d", checkpoint, processed)

            # Backpressure throttle: widen this if the write pool starts rejecting.
            await asyncio.sleep(0.5)
    finally:
        await client.close_point_in_time(id=pit_id)


async def main() -> None:
    client = AsyncElasticsearch("https://es-cluster:9200", api_key="YOUR_API_KEY", verify_certs=True)
    try:
        await run_reindex(client, "source-logs-2023.10", "target-logs-2023.10-reindexed")
    finally:
        await client.close()


if __name__ == "__main__":
    asyncio.run(main())

Phase 4: Atomic Alias Swap & Post-Migration Verification

Once the checkpoint file indicates the copy is exhausted, execute an atomic alias swap. Restore production number_of_replicas and refresh_interval and wait for green first — promoting an unreplicated index risks losing freshly migrated data to a single node failure. The _aliases action is transactional and guarantees zero query downtime because the alias never points at both indices or neither:

POST /_aliases
{
  "actions": [
    { "remove": { "index": "source-index", "alias": "app-search-alias" } },
    { "add":    { "index": "target-index",  "alias": "app-search-alias", "is_write_index": true } }
  ]
}

Immediately verify routing and the ILM handoff:

GET /_cat/aliases/app-search-alias?v
GET /target-index/_ilm/explain

Expected _ilm/explain Output Post-Swap:

{
  "indices": {
    "target-index": {
      "index": "target-index",
      "managed": true,
      "policy": "log-retention-90d",
      "phase": "hot",
      "action": "rollover",
      "step": "check-rollover-ready"
    }
  }
}

Then re-enable full shard allocation so replicas distribute:

PUT _cluster/settings
{
  "transient": {
    "cluster.routing.allocation.enable": "all"
  }
}

A freshly promoted index serves cold caches and can spike search latency on first query; pre-loading it is covered in Cache Warming Strategies for New Indices. Pre-warming queries that target high-cardinality keyword fields immediately after allocation resumes hydrate the field-data and page caches before real traffic lands.

Phase 5: Incident Recovery & Escalation Paths

When the copy stalls or cluster health degrades, run the diagnostic and recovery sequence below. Do not rely on automatic retries for stuck allocation states.

Identify stuck tasks:

GET _tasks?actions=*reindex&detailed=true&human

Cancel a stuck task:
```
POST /_tasks/<task_id>/_cancel
```

Force a manual reroute for an unassigned primary. If a primary shard remains UNASSIGNED due to node failure or corruption, allocate a stale primary. This accepts potential data loss for that shard but restores availability:

POST _cluster/reroute
{
  "commands": [
    {
      "allocate_stale_primary": {
        "index": "target-index",
        "shard": 0,
        "node": "es-data-node-03",
        "accept_data_loss": true
      }
    }
  ]
}

Automated Python recovery hook. Integrate a watchdog that polls _cluster/health every 15 seconds. If status stays red for more than 60 seconds, pause the driver, flush the checkpoint, and page on-call. Cancel pending asyncio bulk tasks cleanly so the checkpoint file is never left half-written.

Escalation protocol:

If _cluster/reroute fails with NO_VALID_SHARD_COPY, engage the storage team for disk-level diagnostics.
If mapper_parsing_exception persists post-recovery, halt the migration, revert the alias to the source, and reconcile schema drift before resuming.
Maintain audit logs of all _cluster/settings mutations and checkpoint writes for compliance review.

Gotchas and edge cases

PIT expiry mid-run. Every batch renews keep_alive: "5m", but a stall longer than that (a long GC pause, a paused debugger) expires the PIT and the next search throws. On resume the script opens a fresh PIT and continues from the checkpoint — correct, but any documents written to the source after the original PIT opened will not be in this copy. Block source writes for the final catch-up batch, or run a second delta pass filtered on @timestamp.
_shard_doc requires a PIT. The _shard_doc sort field only exists inside a Point-In-Time context. Copying the same query without opening a PIT first fails with No mapping found for [_shard_doc]. Never drop the pit block from fetch_batch.
Idempotency depends on stable _id. Re-running after a crash is safe only because op_type: create skips ids already written. If the source auto-generates ids or you let the target assign new ones, a re-run duplicates every document from the last checkpoint forward. Copy doc["_id"] explicitly, as the script does.
409 conflicts are noise on a re-run, not failure. After a crash the first replayed batch will report version_conflict_engine_exception for the documents already copied. Distinguish those expected 409s from mapper_parsing_exception (fatal) before treating the bulk response as an error — routing genuine collisions is covered in Resolving Document Conflicts During Reindex.

Frequently Asked Questions

Why drive the copy from Python instead of using the server-side _reindex API?

Server-side _reindex is a single opaque task: you can throttle and poll it, but you cannot checkpoint inside it, so a mid-run failure restarts from zero unless you filtered the query yourself. Driving the read/write loop from the client lets you persist the exact search_after cursor after every batch, resume at that document after a crash, and inspect or transform each hit before it is written. For a straightforward mapping copy, the batch _reindex workflow is simpler; reach for this driver when resumability and per-batch observability matter more than raw simplicity.

How does the checkpoint make the run resumable?

After each successful bulk batch the script writes the last hit's sort values (the _shard_doc cursor) to reindex_checkpoint.json. On startup it reads that file and passes the values as search_after, so pagination continues from the next document. Paired with op_type: create, replaying the final in-flight batch is harmless: already-written ids are skipped as conflicts and only the missing documents are indexed.

Why sort on _shard_doc rather than _id or @timestamp?

_id has no doc-values by default, so sorting on it is rejected. A business field like @timestamp is not guaranteed unique, so it cannot serve as a stable tiebreaker for deep pagination. _shard_doc is an implicit, monotonically increasing, per-shard tiebreaker that exists specifically for Point-In-Time plus search_after; it gives stable, gap-free paging across the entire index at the lowest cost.

Does the alias swap really cause zero downtime?

Yes, because a single POST /_aliases request with both a remove and an add action is applied atomically in one cluster-state update. There is no instant where app-search-alias resolves to both indices or to none. The only precondition is that the target is already green with replicas restored before the swap, so promotion never exposes an unreplicated or still-recovering index to live traffic.

Designing Batch Reindex Workflows — the parent state machine this driver implements.
Tuning Bulk Request Size for High-Throughput Reindexing — sizing BATCH_SIZE against write-pool capacity.
Resolving Document Conflicts During Reindex — routing the 409s a resumed run produces instead of dropping them.
Tracking Reindex Progress & Performance — wiring the driver’s per-batch counters into monitoring.
Monitoring ILM Execution & Error States — diagnosing a target that stalls after cutover.

← Back to Designing Batch Reindex Workflows · Automated Reindexing Pipelines & Workflows

Python Script for Zero-Downtime Elasticsearch Reindexing #

Where this fits in the reindex lifecycle #

Prerequisites #

Phase 1: Pre-Migration Validation & ILM Alignment #

Phase 2: Cluster State Diagnostics & Allocation Control #

Phase 3: Core Execution Engine (Python v8 Async) #

Phase 4: Atomic Alias Swap & Post-Migration Verification #

Phase 5: Incident Recovery & Escalation Paths #

Gotchas and edge cases #

Frequently Asked Questions #

Related #