How does op_type: create make a reindex re-run safe?

With op_type: create, Elasticsearch refuses to overwrite an existing document id and reports it as a version conflict. Paired with conflicts: proceed, a second run of the same reindex skips every document written by the first run and writes only what is missing, turning a partial failure into a resumable, converging operation rather than duplicating or clobbering data.

Designing Batch Reindex Workflows

Production Elasticsearch environments rarely tolerate a single monolithic _reindex call fired from a console. Migrating data during mapping evolution, resharding, or a storage-tier change has to run as an atomic, observable pipeline — not a blind bulk copy that can exhaust thread pools, trip circuit breakers, or leave a half-populated target behind after a network blip. A batch reindex workflow solves this by treating the migration as a deterministic state machine: it provisions a target with the correct lifecycle policy attached, chunks the source into bounded scroll windows, copies documents idempotently under an explicit throttle, validates parity, and only then swaps the write alias in one cluster-state update. This page is part of the automated reindexing pipelines and workflows topic and focuses on the design of the batch stage itself — how to make each run resumable, bounded, and safe to re-execute.

Prerequisites

Confirm the following before you provision a target or submit a reindex task:

Elasticsearch 8.x cluster with the _reindex and _tasks APIs reachable from the automation host.
elasticsearch-py v8+ installed (elasticsearch>=8.0,<9.0) — the async client (AsyncElasticsearch) is used for the polling loop below.
Dedicated data-tier node attributes in place so target shards can land on the intended tier — see the hot-warm-cold architecture for the routing model.
An index template for the target pattern that attaches an ILM policy and a rollover_alias, so the destination is governed from its first document.
Write access to a service account scoped by RBAC with manage on the target pattern and read on the source.
A measured baseline of sustained write throughput under peak query load, so requests_per_second can be set relative to real capacity rather than guessed.

Architecture: The Batch Reindex State Machine

A batch reindex is a short, well-bounded state machine. The source stays readable, a fresh target is provisioned with write-optimized settings and the lifecycle policy attached before any document moves, the copy runs throttled and idempotently, counts are validated, and only a passing validation authorizes the alias swap. A failure never mutates the alias — it routes to a bounded retry so a transient rejection does not corrupt target state. This is how the batch stage slots into the parent pipeline’s larger cutover sequence.

Because each state has an explicit success and failure edge, the workflow is resumable: a crash between SubmitReindex and Polling leaves a task id you can re-attach to, and op_type: create guarantees that re-running the copy skips documents already written. The ILM policy attached at Provision means the target starts aging correctly the moment data lands, rather than inheriting default settings that silently bypass retention.

Configuration Reference: Provisioning an ILM-Aware Target

Reindex success is decided before the first document is copied. The target must be created with explicit mappings, a deliberate shard count, and a lifecycle policy — never allowed to spring into existence from a default template during ingestion. The template below binds allocation to the hot tier and attaches the rollover alias and policy so the destination is lifecycle-managed from document one.

PUT _index_template/reindex_target_template
{
  "index_patterns": ["logs-v2-*"],
  "template": {
    "settings": {
      "number_of_shards": 5,
      "number_of_replicas": 0,
      "refresh_interval": "-1",
      "index.routing.allocation.include._tier_preference": "data_hot",
      "index.lifecycle.name": "logs-retention-90d",
      "index.lifecycle.rollover_alias": "logs-v2"
    },
    "mappings": {
      "properties": {
        "@timestamp": { "type": "date" },
        "message":    { "type": "text", "coerce": true }
      }
    }
  }
}

Two settings are deliberately tuned for the copy, not for steady state: number_of_replicas: 0 and refresh_interval: -1 maximize write velocity by removing replication and refresh overhead during ingestion. Both are restored to production values after the copy completes and before the alias swap, so search consumers never see an unreplicated or unsearchable index. index.lifecycle.rollover_alias and index.lifecycle.name must be present before ingestion — attaching a policy afterward leaves the already-written documents governed by nothing until the next rollover. The tier-preference routing keeps new shards on hot nodes and depends on the same allocation model that governs rollover conditions in the Hot phase.

When calibrating throughput, set source.size between 1000 and 2000 documents and requests_per_second to 30–50% of sustained write capacity. Oversized scroll windows deserialize into large payloads on coordinating nodes and trigger circuit_breaking_exception; undersized batches waste network round-trips and stretch the alias-transition window. The full calibration methodology lives in Optimizing Reindex Thresholds & Bulk Sizes.

Step-by-Step Implementation

1. Submit a throttled, idempotent reindex task

Run the copy asynchronously (wait_for_completion=False) so the client is not blocked for the duration of a large migration. op_type: create enforces idempotency by skipping documents that already exist in the target, and conflicts: proceed prevents a single version collision from aborting the whole batch.

import asyncio
import logging
from elasticsearch import AsyncElasticsearch
from elasticsearch import ApiError

logging.basicConfig(level=logging.INFO, format="%(asctime)s | %(levelname)s | %(message)s")


async def submit_batch_reindex(es: AsyncElasticsearch, source: str, target: str, rps: int = 1000) -> str:
    """Submit a throttled, idempotent reindex task and return the task id.

    op_type=create skips documents that already exist, so a re-run after a
    crash is safe. conflicts=proceed keeps a single collision from aborting.
    """
    try:
        response = await es.reindex(
            source={"index": source, "size": 1000, "query": {"match_all": {}}},
            dest={"index": target, "op_type": "create"},
            max_docs=5_000_000,
            requests_per_second=rps,
            conflicts="proceed",
            wait_for_completion=False,
        )
        task_id = response["task"]
        logging.info("Reindex task submitted: %s", task_id)
        return task_id
    except ApiError as exc:
        logging.error("Failed to submit reindex task: %s", exc.info)
        raise

2. Poll the task API to completion

Progress must be read from the _tasks API, never inferred from console output. The loop below surfaces per-batch counters, distinguishes a task-level error from per-document failures, and backs off on transient polling errors so a brief network partition does not abort monitoring.

async def poll_task_progress(es: AsyncElasticsearch, task_id: str, interval: int = 5) -> dict:
    """Poll _tasks until the reindex completes, raising on any failure."""
    while True:
        try:
            task = await es.tasks.get(task_id=task_id)
            status = task["task"]["status"]
            logging.info(
                "Progress: %s/%s docs | updated: %s | conflicts: %s",
                status["created"], status["total"],
                status["updated"], status["version_conflicts"],
            )

            if task["completed"]:
                # A task-level failure surfaces as a top-level `error`;
                # per-document failures appear under response.failures.
                if "error" in task:
                    logging.error("Task failed: %s", task["error"])
                    raise RuntimeError("Reindex task failed")
                failures = task.get("response", {}).get("failures", [])
                if failures:
                    logging.error("Reindex finished with %d document failure(s).", len(failures))
                    raise RuntimeError("Reindex task had document-level failures")
                logging.info("Reindex task completed successfully.")
                return status

            await asyncio.sleep(interval)
        except ApiError as exc:
            logging.warning("Task polling interrupted: %s. Retrying in %ss", exc, interval * 2)
            await asyncio.sleep(interval * 2)
            interval = min(interval * 2, 30)

Idempotency comes entirely from op_type: create. When you expect version collisions or duplicate primary keys — for example, migrating overlapping data streams — route them deliberately rather than dropping them; the strategies for that live in Resolving Document Conflicts During Reindex.

3. Restore production settings and cut over the alias

Once validation passes, restore replication and refresh, wait for the target to reach green, and then swap the write alias in a single atomic _aliases action so no window exists where the alias points at both indices or neither.

POST _aliases
{
  "actions": [
    { "remove": { "index": "logs-v1", "alias": "logs-write" } },
    { "add":    { "index": "logs-v2-000001", "alias": "logs-write", "is_write_index": true } }
  ]
}

The complete driver that wires these steps together — checkpointing, settings restore, and the atomic swap — is detailed in Python Script for Zero-Downtime Elasticsearch Reindexing.

Verification

Never promote an alias on trust. Confirm parity and lifecycle attachment with three cheap checks before and after the swap.

# 1. Document-count parity between source and target
GET /_cat/indices/logs-v1,logs-v2-000001?v&h=index,docs.count,store.size

# 2. Confirm the ILM policy actually bound to the target
GET /logs-v2-000001/_settings?filter_path=**.lifecycle

# 3. Confirm the target is being managed and is not stuck in an ERROR step
GET /logs-v2-000001/_ilm/explain

The count query must show matching docs.count (allowing for documents intentionally dropped by a conflict filter). The _ilm/explain response should report a live phase and a step other than ERROR; a missing lifecycle block means the destination inherited defaults and is silently bypassing retention. For an audit beyond counts, run a _search with track_total_hits: true scoped by routing to confirm shard-level consistency before the swap.

Threshold Tuning and Performance Guidance

Reindex throughput is bounded by three cluster limits: JVM heap, disk I/O, and the write thread pool queue. requests_per_second acts as a token-bucket throttle on the write side, but it does nothing to stop a single oversized scroll payload from tripping the parent circuit breaker on a coordinating node. Tune against the limits, not against a target rate:

Scroll size. Keep source.size at 1000–2000. Larger windows inflate deserialization heap on coordinating nodes and are the most common cause of circuit_breaking_exception during a copy.
Write queue depth. Watch GET _cat/thread_pool/write?v&h=node_name,active,queue,rejected. If queue sits above 50% of the pool size, lower requests_per_second or keep refresh_interval at -1 for the duration of the copy.
Heap headroom. Validate breaker limits with GET _nodes/stats/breaker before execution. Keeping the parent breaker at its 70% default and request around 60% of heap leaves room for merges that run concurrently with ingestion.
Parallelism. For very large sources, the _reindex API parallelizes with slices (set to the source’s primary-shard count, capped at 8–12 per coordinating node). Sizing this against topology is covered in Optimizing Reindex Thresholds & Bulk Sizes.

After the copy, a freshly promoted index serves cold caches and can spike search latency on first query; pre-loading it is covered in Cache Warming Strategies for New Indices.

Troubleshooting

Even with throttling and idempotency guards, production reindex runs hit edge cases. Each of the states below maps to an observable cluster metric.

Symptom	Root cause	Resolution
`es_rejected_execution_exception`	`write` thread pool queue saturated	Reduce `requests_per_second` ~20%; rethrottle a running task with `reindex_rethrottle` rather than cancelling it.
`circuit_breaking_exception`	Scroll payload exceeds the heap breaker	Lower `source.size` to 500; confirm `indices.breaker.total.limit` is at its default and not lowered in `elasticsearch.yml`.
`version_conflict_engine_exception`	Concurrent writes to the source during the copy	Keep `op_type: create` with `conflicts: proceed`, or block source writes for the batch window.
Task appears stuck, no progress	Long scroll windows stalling, or the client blocked	Poll `_tasks/<id>` directly instead of blocking; a submitted async task keeps running independently of the client.
Target bypasses retention after cutover	Destination inherited defaults; no `index.lifecycle.name`	Attach the policy on the template before the copy; verify with `GET <index>/_settings?filter_path=**.lifecycle`.

Read live task detail with GET _tasks?detailed=true&actions=*reindex to extract created, updated, deleted, and batches counters. Wire those counters into monitoring rather than scraping logs — the collection patterns are in Tracking Reindex Progress & Performance. When a target is managed but stalls after cutover, diagnose it the same way you would any lifecycle stall, using monitoring ILM execution and error states.

Frequently Asked Questions

Why submit the reindex with wait_for_completion=false instead of blocking?

A large copy can run for minutes or hours. Blocking ties the client to a single long-lived HTTP connection that a proxy or load balancer will eventually cut, and a dropped connection tells you nothing about whether the copy is still running server-side. Submitting asynchronously returns a task id immediately; the copy proceeds server-side and you observe it through GET _tasks/<id>, which is also what makes the workflow resumable after a client crash.

How does op_type: create actually make a re-run safe?

With op_type: create, Elasticsearch refuses to overwrite a document id that already exists in the target and reports it as a version conflict instead. Paired with conflicts: proceed, a second run of the same reindex simply skips every document copied by the first run and writes only what is missing. That turns a partial failure into a resumable operation — you re-submit the same task and it converges, rather than duplicating or clobbering data.

Should I set number_of_replicas to 0 during the copy?

For the copy window, yes — dropping replicas removes replication overhead and roughly doubles write velocity. The critical rule is to restore replicas (and refresh_interval) and wait for the index to return to green before swapping the write alias. Promoting an unreplicated index means a single node loss during the transition loses freshly migrated data with no recovery copy.

What is the difference between a task error and per-document failures?

A top-level error on the task means the whole operation aborted — a bad query, a missing index, an auth failure. Per-document failures, which appear under response.failures only on a completed task, mean the copy ran but some documents were rejected individually, usually by a mapping conflict or a version collision. The polling loop treats both as fatal so a "completed" task with silent document losses never gets promoted.

Can I speed up a run that is already in progress?

Yes. You do not need to cancel and re-submit. Call reindex_rethrottle(task_id=..., requests_per_second=...) (REST: POST _reindex/<task_id>/_rethrottle) to raise or lower the throttle on a live task. Set it to -1 to remove throttling entirely, or lower it the moment the write queue starts backing up — the change applies to the running task without interrupting it.

Optimizing Reindex Thresholds & Bulk Sizes — calibrating source.size, requests_per_second, and slices against cluster topology.
Resolving Document Conflicts During Reindex — routing version collisions and duplicate keys instead of dropping them.
Tracking Reindex Progress & Performance — turning _tasks counters into monitoring signals.
Cache Warming Strategies for New Indices — pre-loading a freshly promoted target to avoid a cold-cache latency spike.
Python Script for Zero-Downtime Elasticsearch Reindexing — the complete checkpoint-driven migration driver.

← Back to Automated Reindexing Pipelines & Workflows

Designing Batch Reindex Workflows #

Prerequisites #

Architecture: The Batch Reindex State Machine #

Configuration Reference: Provisioning an ILM-Aware Target #

Step-by-Step Implementation #

1. Submit a throttled, idempotent reindex task #

2. Poll the task API to completion #

3. Restore production settings and cut over the alias #

Verification #

Threshold Tuning and Performance Guidance #

Troubleshooting #

Frequently Asked Questions #

Related #

Explore deeper

Related in Reindexing Pipelines