What is the difference between external and external_gte versioning in Elasticsearch?

version_type: external writes only when the incoming _version is strictly greater than the stored version. external_gte also accepts an incoming version equal to the stored one, which makes a replay idempotent because a document already copied at the same version is accepted rather than rejected. Both still reject a strictly lower version, so monotonicity is preserved either way.

How do I tell a stale write apart from an out-of-order write in a reindex conflict?

Read current version [X] and provided [Y] from the failure in GET _tasks/ ?detailed=true&error_trace=true. A positive current minus provided delta means the stored version is ahead, an out-of-order arrival that an external_gte replay can reconcile. A zero or negative delta means a duplicate or genuinely stale write that must be fixed at the upstream offset.

Handling Version Conflicts with External Versioning

When a reindex runs with version_type: external (or external_gte) and an incoming _version is not strictly newer than the version already stored under the same _id, Elasticsearch rejects the write with a version_conflict_engine_exception — this page is the diagnostic and replay protocol for clearing those rejections without breaking monotonicity or dropping data.

External versioning is the strategy you reach for inside a reindex pipeline when the newest upstream version of a document must win even over a copy already present on the target — telemetry keyed by Kafka offsets, a change-data-capture (CDC) sequence number, or an application clock. It is one of the two composable conflict strategies described in the parent guide, resolving document conflicts during reindex; this page picks up at the exact point where an external write is rejected and you need to reconcile it. Unlike internal versioning, which auto-increments on every mutation, external treats the supplied integer as ground truth and enforces absolute monotonicity: any write where incoming_version <= stored_version is refused.

Prerequisites

Elasticsearch 8.x with the _reindex, _tasks, and _ilm/explain APIs reachable from the automation host.
elasticsearch-py v8+ (elasticsearch>=8.0,<9.0) for the replay script below.
Documents carry a stable external version field (for example external_version) sourced from an upstream monotonic counter — Kafka offset, CDC LSN, or a logical clock.
Target mappings already accept the payload; an external write is rejected outright if the destination index is not prepared for it.
Write access scoped by RBAC so the recovery token can index into the target but cannot mutate the ILM policy attached to it.

When an External-Version Conflict Fires

version_conflict_engine_exception under external versioning is not a bug — it is the engine enforcing the contract. It surfaces under three operational conditions, and telling them apart is the whole job of the diagnostic phase:

Out-of-order arrival. Network partitions, write-queue saturation, or uneven shard routing cause non-sequential arrival — a document with version=104 lands after 105 has already been written. The 105 write already committed; the late 104 is correctly rejected as stale.
Dual-write overlap during migration. Live application traffic keeps writing to the source while the copy runs. Clock skew, offset resets, or partition rebalancing yield documents whose external version is lower than a record already reindexed, so the copy’s write loses to the live write.
Metadata inheritance during an ILM transition. When ILM moves an index through rollover or shrink, a pipeline that copies _version without explicitly declaring version_type silently falls back to internal versioning. A later external update then collides with the inherited internal _version and is rejected.

Diagnose: Read the Conflict Delta

Do not replay anything until you have classified the conflicts. The sign of current − provided tells you which of the three conditions above you are in.

# 1. Pull the failed reindex tasks and their conflict payloads
GET _tasks?actions=*reindex*&detailed=true&error_trace=true

Filter for "type": "version_conflict_engine_exception" and read the current version [X] versus provided [Y] values on each failure. A positive current − provided delta means out-of-order arrival (the stored version is ahead); a zero or negative delta means a duplicate or stale write that must be fixed upstream, not merely replayed.

# 2. Confirm the collisions are not masking a stuck lifecycle step
GET logs-app-000001/_ilm/explain

{
  "indices": {
    "logs-app-000001": {
      "index": "logs-app-000001",
      "managed": true,
      "policy": "daily-rollover",
      "phase": "hot",
      "action": "rollover",
      "step": "check-rollover-ready"
    }
  }
}

If step is shrink or rollover while conflicts accumulate, metadata inheritance is actively blocking external writes — resolve the lifecycle state first, following monitoring ILM execution and error states, before you replay.

# 3. Segment-level version spread — high variance flags fragmented write paths
GET _cat/segments/logs-app-000001?v&h=index,shard,segment,version&sort=index,shard

Replay Rejected Documents with `external_gte`

Once diagnostics confirm the deltas are zero or positive (equal or out-of-order, not genuinely stale), replay the rejected documents with version_type: external_gte. external_gte accepts an incoming version greater than or equal to the stored version while still rejecting strictly lower versions — monotonicity is preserved, not relaxed, which is exactly what makes the replay idempotent when a document may already have been copied at the same version.

import logging
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk

client = Elasticsearch(
    "https://cluster-node-01:9200",
    api_key="YOUR_API_KEY",
    request_timeout=30,
    max_retries=3,
    retry_on_timeout=True,
    verify_certs=True,
)

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("version_conflict_recovery")


def recover_external_version_conflicts(target_index: str, failed_docs_batch: list) -> None:
    """Replay rejected documents with external_gte so re-copies are idempotent.

    Each action carries the upstream version explicitly. In helpers.bulk the
    action metadata uses UNDERSCORE-prefixed keys (_version, _version_type);
    a plain "version"/"version_type" would be treated as document fields and
    silently disable versioning, reintroducing the very overwrite you are
    trying to avoid.
    """
    actions = []
    for doc in failed_docs_batch:
        body = doc["_source"]
        ext_version = body.get("external_version", doc.get("_version"))
        actions.append({
            "_index": target_index,
            "_id": doc["_id"],
            "_op_type": "index",
            "_version": ext_version,
            "_version_type": "external_gte",  # accept >= stored, reject strictly lower
            "_source": body,
        })

    # raise_on_error=False so an expected stale-version rejection is COUNTED,
    # not raised; stats_only=True returns (successful, error_count).
    success, errors = bulk(
        client, actions, raise_on_error=False, chunk_size=500, stats_only=True,
    )
    logger.info("Recovery complete: %d documents upserted.", success)
    if errors:
        logger.warning("%d documents still rejected — verify upstream offsets.", errors)

A document that still fails after an external_gte replay has a genuinely stale incoming version (a negative delta): the upstream offset is wrong, and the fix belongs in the producer, not in Elasticsearch.

Verification

Confirm the replay converged and monotonicity holds before resuming the pipeline.

# 1. Version envelope on the target — min/max should bracket the upstream range
GET logs-app-000001/_search
{
  "size": 0,
  "aggs": {
    "max_version": { "max": { "field": "external_version" } },
    "min_version": { "min": { "field": "external_version" } }
  }
}

{
  "aggregations": {
    "max_version": { "value": 1048576.0 },
    "min_version": { "value": 1.0 }
  }
}

# 2. Re-run the copy over the reconciled window and confirm zero conflicts
GET _tasks/<task_id>?filter_path=task.status.version_conflicts,response.failures

A subsequent _reindex reporting version_conflicts: 0 and an empty failures list is the signal that the target is safe to promote and that ILM can resume.

Gotchas and Edge Cases

external requires a version on every document. A reindex under version_type: external with any document missing its version integer fails the whole item, not just the copy — carry the version explicitly in the destination action or the source _version.
conflicts: proceed counts, it does not repair. When the original copy ran with conflicts: proceed, rejected documents were skipped and counted under version_conflicts; the higher-versioned destination document was preserved. Nothing is retried automatically — the replay above is what re-applies them.
A zero/negative delta is never fixed by external_gte. external_gte only rescues equal-version replays and out-of-order-but-newer arrivals. A strictly lower incoming version is a stale write; forcing it would violate monotonicity, so the correction belongs upstream in offset generation.
Exclude immutable audit indices from external versioning. Compliance/audit indices that must never be overwritten should not accept external writes at all — routing recovery traffic through them creates recursive conflict loops. Keep operational telemetry and audit records on separate index patterns.

Escalation and Compliance

If the version_conflict_engine_exception rate exceeds roughly 5% of bulk throughput, apply producer backpressure (or reindex_rethrottle) and switch the copy to external_gte rather than letting the task thrash.
Persistent collisions after a clean replay indicate a flaw in upstream sequence generation — mandate centralized offset arbitration or exactly-once delivery at the transport layer rather than patching in Elasticsearch.
Log every version override with user, timestamp, original_version, and resolution_method so the reconciliation itself is auditable.

Frequently Asked Questions

What is the difference between external and external_gte?

version_type: external writes only when the incoming _version is strictly greater than the stored version. external_gte also accepts an incoming version equal to the stored one, which is what makes a replay idempotent: a document already copied at the same version is accepted instead of rejected. Both still reject a strictly lower version, so monotonicity is preserved in either case.

How do I tell a stale write apart from an out-of-order one?

Read current version [X] and provided [Y] from the failure in GET _tasks/<task_id>?detailed=true&error_trace=true. A positive current − provided delta means the stored version is ahead — out-of-order arrival that an external_gte replay can reconcile. A zero or negative delta means a duplicate or genuinely stale write that must be fixed at the upstream offset.

Why did my version/version_type keys get ignored in helpers.bulk?

In elasticsearch-py the per-action metadata keys are underscore-prefixed: _version and _version_type. Plain version/version_type are treated as ordinary document fields, silently disabling versioning so the write overwrites freely. Always use the underscore form in the action dict.

Resolving document conflicts during reindex — choosing between op_type: create and external versioning, and enforcing it with conflicts: proceed.
Tracking reindex progress & performance — turning version_conflicts counters into live dashboards and alerts.
Optimizing reindex thresholds & bulk sizes — calibrating chunk_size, slices, and requests_per_second so conflict-heavy replays do not saturate the write queue.
Monitoring ILM execution & error states — clearing a lifecycle step stuck behind accumulating conflicts.

← Back to Resolving Document Conflicts During Reindex · Automated Reindexing Pipelines & Workflows

Handling Version Conflicts with External Versioning #

Prerequisites #

When an External-Version Conflict Fires #

Diagnose: Read the Conflict Delta #

Replay Rejected Documents with external_gte #

Verification #

Gotchas and Edge Cases #

Escalation and Compliance #

Frequently Asked Questions #

Related #