Do reindex conflicts corrupt or delete data?

Not on their own. Under op_type: create or external versioning, a conflict means the incoming write was declined and the existing document was preserved, with no overwrite or deletion. The real risk is the opposite: leaving op_type at its index default silently overwrites the target, and leaving conflicts at abort stops the task mid-copy and leaves a partially populated index.

Resolving Document Conflicts During Reindex

Document conflicts are the single most common way a large _reindex job goes wrong in a busy Elasticsearch cluster. The moment a copy targets an index that already holds documents — a resumed run, an overlapping data stream, a migration that overlaps live ingestion — Elasticsearch has to decide what “already exists” means, and by default it decides badly: it aborts. Every version_conflict_engine_exception traces back to the same root question of whether an incoming document should overwrite, skip, or reject the version already stored under the same _id. Answer that question deterministically before the copy starts and a conflict becomes a counted, reconciled event; answer it implicitly and a single collision halts the task, hot-spots a shard, or silently drops data. This page, part of the automated reindexing pipelines and workflows topic, covers how to pick a conflict strategy, enforce it in the reindex body, and recover the documents a conflict left behind.

Prerequisites

Confirm the following before you submit a reindex against a populated target:

Elasticsearch 8.x cluster with the _reindex and _tasks APIs reachable from the automation host.
elasticsearch-py v8+ installed (elasticsearch>=8.0,<9.0) — the async client (AsyncElasticsearch) drives the polling loop below.
A decision on document identity: whether _id is application-assigned (so collisions are meaningful) or auto-generated (so collisions cannot occur).
A decision on versioning: internal _version versus an upstream version_type: external value sourced from Kafka offsets, CDC sequence numbers, or an application clock.
Target mappings that accept the payload — an external version write is rejected outright if the destination is not prepared for it.
Write access scoped by RBAC with write and create_index on the target pattern and read on the source, so the pipeline token cannot mutate lifecycle definitions it does not own.

Architecture: Where a Conflict Is Decided

A reindex conflict is resolved by the interaction of three fields on the destination action — op_type, version_type, and the top-level conflicts setting — evaluated per document as the bulk write lands. op_type decides whether an existing _id is an error or a skip; version_type decides which of two competing versions wins; and conflicts decides whether a rejection aborts the whole task or is merely counted and stepped over. The flow below shows the branch you commit to at submission time; the parent pipeline’s larger zero-downtime cutover sequence treats this whole block as its single “copy” state.

The three strategies are not interchangeable, and only two of them compose. op_type: create and external versioning are mutually exclusive — create refuses to touch an existing _id at all, so there is no second version left to compare. Pick create when the target should be treated as append-only and any pre-existing document is authoritative (the common case for resumable batch copies). Pick external versioning when the newest upstream version must win even over a document already present. In both cases, layer conflicts: proceed on top so the task counts collisions in its metadata and finishes, rather than aborting on the first one and leaving a half-populated index behind.

Configuration Reference

The reindex body is where the strategy is declared. Elasticsearch defaults to conflicts: abort, which halts on the first mismatch, and to op_type: index, which overwrites freely. Neither default is safe for an automated migration. The two annotated bodies below cover the two composable strategies.

Strategy A — idempotent, skip-on-exist (resumable batch copy):

POST _reindex?wait_for_completion=false
{
  "source": {
    "index": "logs-v1",
    "size": 2000
  },
  "dest": {
    "index": "logs-v2-000001",
    "op_type": "create"
  },
  "conflicts": "proceed",
  "slices": "auto",
  "requests_per_second": 1000
}

Strategy B — newest-version-wins (external versioning):

POST _reindex?wait_for_completion=false
{
  "source": { "index": "events-v1", "size": 2000 },
  "dest": {
    "index": "events-v2-000001",
    "version_type": "external"
  },
  "conflicts": "proceed"
}

With version_type: external, Elasticsearch writes the document only when the incoming _version is strictly greater than the stored one; external_gte relaxes that to greater-than-or-equal, which is what you want when replaying documents that may already have been copied at the same version. Monotonicity is still enforced — a strictly lower version is always rejected — so conflicts: proceed is what keeps those expected rejections from aborting the run. The full diagnostic and replay protocol for that path lives in handling version conflicts with external versioning.

Two settings on the destination index — not in the reindex body — reduce conflict pressure during the copy. Set index.refresh_interval: -1 to suppress per-batch segment generation, and keep the source write-locked with index.blocks.write: true so concurrent ingestion cannot introduce fresh collisions mid-scroll. Where source and target live on different tiers, the hot-warm-cold architecture governs where the new shards land, and misrouted shard allocation is a frequent hidden cause of what looks like a conflict stall but is really an UNASSIGNED target shard.

Step-by-Step Implementation

1. Lock the source to stop new collisions

A conflict you can reason about is one whose inputs stop changing. Freeze source writes before the copy so the scroll sees a stable snapshot and no fresh document can arrive carrying a colliding _id or version.

PUT /logs-v1/_settings
{ "index.blocks.write": true }

2. Submit the reindex with an explicit conflict strategy

Run the copy asynchronously (wait_for_completion=False) so a long migration never trips the HTTP timeout, and pick exactly one identity strategy. The example uses op_type: create for a resumable copy; swap to version_type: external for newest-wins semantics.

import asyncio
from elasticsearch import AsyncElasticsearch
from elasticsearch import ApiError


async def submit_reindex(es: AsyncElasticsearch, source: str, dest: str) -> str:
    """Submit a throttled, idempotent reindex and return the server task id.

    op_type=create makes a re-run safe: documents already on the target are
    reported as conflicts and skipped, never overwritten. conflicts=proceed
    keeps a single collision from aborting the whole task. Do NOT combine
    op_type=create with version_type=external — they are mutually exclusive.
    """
    try:
        response = await es.reindex(
            source={"index": source, "size": 2000, "query": {"match_all": {}}},
            dest={"index": dest, "op_type": "create"},
            conflicts="proceed",
            slices="auto",
            requests_per_second=1000,
            wait_for_completion=False,
        )
        return response["task"]
    except ApiError as exc:
        # A submit-time ApiError is a bad request or auth failure, not a
        # per-document conflict — surface it rather than retrying blindly.
        raise RuntimeError(f"Reindex submit failed: {exc.info}") from exc

3. Poll the task and extract the conflict count

Conflicts are not exceptions on an async task — they accumulate in status.version_conflicts. Poll _tasks to completion, then read that counter alongside per-document response.failures, which is where a hard rejection (mapping conflict, external-version violation) surfaces separately from a benign skip.

async def poll_conflicts(es: AsyncElasticsearch, task_id: str) -> dict:
    """Poll a reindex task and return its status once completed.

    version_conflicts is the count of documents skipped by the chosen
    strategy; response.failures holds documents rejected for other reasons.
    """
    backoff = 2.0
    while True:
        task = await es.tasks.get(task_id=task_id)
        status = task["task"]["status"]
        if task["completed"]:
            failures = task.get("response", {}).get("failures", [])
            return {
                "created": status["created"],
                "version_conflicts": status.get("version_conflicts", 0),
                "failures": failures,
            }
        await asyncio.sleep(backoff)
        backoff = min(backoff * 1.5, 30.0)  # capped exponential backoff

A completed task with a non-zero version_conflicts and an empty failures list is the healthy outcome for a resumable copy: every conflict was an already-present document that op_type: create correctly skipped. A non-empty failures list is different — those documents were rejected and were not written, so they must be reconciled before the target is promoted. For turning these counters into live dashboards and alerts, see tracking reindex progress and performance.

4. Reconcile and re-run the unprocessed window

A reindex task cannot be resumed from a scroll cursor — the _tasks API exposes no resumable scroll_id for it. Instead, re-run _reindex with a bounding query over only the window that failed or was skipped, still under op_type: create, so documents already copied are skipped rather than duplicated. This makes the re-run idempotent and convergent.

POST _reindex?wait_for_completion=false
{
  "source": {
    "index": "logs-v1",
    "query": { "range": { "@timestamp": { "gte": "2026-06-01", "lt": "2026-06-02" } } }
  },
  "dest": { "index": "logs-v2-000001", "op_type": "create" },
  "conflicts": "proceed"
}

Verification

Never promote a target on the strength of a “completed” task alone. Confirm parity and identity health with three cheap checks.

# 1. Document-count parity — accounting for documents intentionally skipped by conflicts
GET /_cat/indices/logs-v1,logs-v2-000001?v&h=index,docs.count,docs.deleted

# 2. Read the final conflict tally straight from the completed task
GET /_tasks/<task_id>?filter_path=task.status.version_conflicts,task.status.created,response.failures

# 3. Confirm the target is lifecycle-managed and not stuck in an ERROR step
GET /logs-v2-000001/_ilm/explain

The count query should show docs.count matching between source and target once you subtract the documents your strategy deliberately skipped — a gap larger than version_conflicts means real losses, not benign skips. The _ilm/explain response must report a live phase and a step other than ERROR; a missing lifecycle block means the target inherited defaults and is silently bypassing retention, which the parent pipeline’s ILM policy attachment step is meant to prevent. When a conflict stall coincides with an index stuck in an ILM step, diagnose it the same way you would any lifecycle stall, using monitoring ILM execution and error states.

Threshold Tuning and Performance Guidance

Conflict-resolution overhead does not scale linearly. Each rejected write still costs a version lookup, a bulk-item response, and — under conflicts: abort — a full task teardown, so a target with a high collision rate spends real CPU deciding not to write.

Scroll size. Keep source.size between 1000 and 2000. Larger windows inflate deserialization heap on the coordinating node and are the most common trigger of circuit_breaking_exception during a conflict-heavy copy, because rejected items still occupy the bulk response buffer.
Throttle against the write queue, not a target rate. Watch GET _cat/thread_pool/write?v&h=node_name,active,queue,rejected. If queue climbs past ~50% of the pool size, or version_conflict_engine_exception exceeds roughly 5% of throughput, lower requests_per_second so the destination can flush segments and drain the queue before the next batch. Re-throttle a running task live with reindex_rethrottle(task_id=..., requests_per_second=...) rather than cancelling it.
Suppress refresh during the copy. index.refresh_interval: -1 on the target removes per-batch segment churn, which both raises write velocity and reduces the segment fragmentation that makes version lookups slower. Restore it to 1s before cutover.
Partition to isolate conflict domains. For very large sources, split the copy by time range or routing value and drive independent tasks from a queue. A collision in one partition then cannot stall the others, and each sub-task’s version_conflicts is independently reconcilable. The coordination patterns for that live in designing batch reindex workflows, and the calibration methodology for size, slices, and requests_per_second is in optimizing reindex thresholds and bulk sizes.

Troubleshooting

When conflicts persist despite conflicts: proceed, work the failure modes systematically — each maps to an observable cluster signal.

Symptom	Root cause	Resolution
Task aborts on the first collision	`conflicts` left at its `abort` default	Set `conflicts: "proceed"` in the body; conflicts are then counted in `status.version_conflicts` and the task finishes.
`version_conflicts` keeps climbing during the copy	Source still accepting writes, or duplicate `_id`s from a prior run	Set `index.blocks.write: true` on the source for the copy window; rely on `op_type: create` to skip prior-run documents.
`response.failures` shows `version_conflict_engine_exception` under external versioning	Incoming `_version` ≤ stored version (stale or out-of-order upstream offset)	Replay the rejected `_id`s with `version_type: external_gte`; fix the upstream offset if the delta is negative. See the external-versioning guide.
Conflicts on documents that should be unique	Routing keys differ between source and target, splitting `_id`s across shards	Preserve routing with `"dest": { "routing": "" }` so identity resolves to the same shard.
Copy “stalls” with rising conflicts and `UNASSIGNED` target shards	Target allocation filter points at a tier with no capacity	Inspect `GET _cluster/allocation/explain`; free disk or relax the allocation filter — this is an allocation stall, not a true conflict.
Re-run duplicates documents instead of skipping	Re-run used `op_type: index` (overwrite) instead of `create`	Always re-run the unprocessed window with `op_type: create` so already-copied documents are skipped idempotently.

To extract the exact colliding documents for analysis, read the completed task’s failures with GET _tasks/<task_id>?detailed=true&error_trace=true and pull _id, current version, and provided from each entry. A positive current − provided delta means out-of-order arrival; a zero or negative delta means a stale or duplicate write.

Frequently Asked Questions

What is the difference between op_type: create and conflicts: proceed?

They answer different questions. op_type: create decides per document what happens when the _id already exists on the target — it refuses to overwrite and reports a version conflict. conflicts: proceed decides for the whole task what happens when any conflict occurs — count it and keep going, instead of aborting. You almost always want both together: create makes re-runs idempotent, and proceed stops a single collision from killing the task.

Can I use op_type: create together with version_type: external?

No — they are mutually exclusive. op_type: create refuses to touch a document whose _id already exists, so there is never a second version to compare against; external versioning, by contrast, exists precisely to overwrite an existing document when the incoming version is newer. Choose one: create for an append-only, skip-on-exist copy, or version_type: external/external_gte for newest-version-wins semantics.

Where does a reindex report the number of conflicts?

In the task status, under status.version_conflicts. On an async task you read it with GET _tasks/<task_id> once completed is true; the elasticsearch-py client exposes it at task["task"]["status"]["version_conflicts"]. That counter is documents skipped by your strategy. Documents rejected for other reasons — a mapping conflict or a rejected external version — appear separately under response.failures and mean the document was not written.

A conflict-heavy reindex was interrupted. How do I resume it safely?

A reindex has no resumable scroll cursor, so you re-run it rather than resume it. Re-submit _reindex with a bounding range query (on @timestamp or an _id range) covering only the unprocessed window, still with op_type: create. Documents already copied are reported as conflicts and skipped, so the re-run converges on a complete target without duplicating anything.

Do conflicts corrupt or delete data?

Not on their own. Under op_type: create or external versioning, a conflict means the incoming write was declined and the existing document was preserved — no overwrite, no deletion. The real risk is the opposite: leaving op_type at its index default silently overwrites the target, and leaving conflicts at abort stops the task mid-copy and leaves a partially populated index that can be mistaken for a complete one.

Handling version conflicts with external versioning — the diagnostic and replay protocol for version_type: external collisions.
Designing batch reindex workflows — partitioning a migration so a conflict in one window cannot stall the rest.
Optimizing reindex thresholds & bulk sizes — calibrating size, slices, and requests_per_second against cluster capacity.
Tracking reindex progress & performance — turning version_conflicts and task counters into monitoring signals.
Monitoring ILM execution & error states — diagnosing a target stuck in an ILM step after a conflict-heavy copy.

← Back to Automated Reindexing Pipelines & Workflows

Resolving Document Conflicts During Reindex #

Prerequisites #

Architecture: Where a Conflict Is Decided #

Configuration Reference #

Step-by-Step Implementation #

1. Lock the source to stop new collisions #

2. Submit the reindex with an explicit conflict strategy #

3. Poll the task and extract the conflict count #

4. Reconcile and re-run the unprocessed window #

Verification #

Threshold Tuning and Performance Guidance #

Troubleshooting #

Frequently Asked Questions #

Related #

Explore deeper

Related in Reindexing Pipelines