Optimizing Reindex Thresholds & Bulk Sizes

Reindexing throughput is bounded by three cluster resources fighting over the same nodes: disk I/O, JVM heap, and the write thread pool. Push requests_per_second and bulk payload size too hard and the coordinating node trips a circuit_breaking_exception or the write queue overflows with es_rejected_execution_exception; run them too conservatively and you waste IOPS, stretch the alias-cutover window, and leave scroll contexts open long enough to be evicted. This page is about finding the safe operating band — roughly 70–80% of sustained write capacity — and holding it under variable load, whether the copy runs through the _reindex API or a client-side helpers.scan / helpers.bulk loop.

Threshold tuning is the throughput layer of a larger migration. It assumes the surrounding pipeline is already correct: the target is provisioned with write-optimized settings, the lifecycle policy is attached, and the cutover is atomic. Those mechanics belong to Automated Reindexing Pipelines & Workflows, the parent workflow that governs how a reindex stays in step with Index Lifecycle Management (ILM). What follows is how to make that copy fast without destabilising the deployment it runs on.

Prerequisites

Elasticsearch 8.x cluster with dedicated data nodes (co-locating reindex load on master-eligible nodes amplifies GC pauses).
elasticsearch-py v8.0+ installed — this page uses the v8 client surface (helpers.scan, helpers.bulk, keyword-argument request bodies), not the legacy body= pattern.
Monitoring access to GET _cat/thread_pool/write, GET _nodes/stats/breaker, and GET _nodes/stats/indices so you can read live pressure while tuning.
The target index provisioned with number_of_replicas: 0 and refresh_interval: -1 for the duration of the copy (reverted at cutover), with its ILM policy and rollover alias already attached.
A staging index representative of production document size — payload tuning against 2 KB docs is meaningless if production averages 40 KB.
A write-throughput baseline (docs/sec at peak, measured cluster-wide) so requests_per_second targets have a real number to scale from.

Architecture: The Throttle Feedback Loop

Every reindex is a control loop with two knobs and two limits. The knobs are requests_per_second (how fast batches are dispatched) and payload size (source.size / chunk_size on reads, max_chunk_bytes on writes). The limits are the write thread pool queue and the JVM heap circuit breakers. The tuning job is to keep both limits below saturation while pushing the knobs as high as they will go. This loop nests inside the parent workflow’s state machine — it runs entirely within the “Reindex (throttled)” transition, before the atomic alias swap.

Two knobs, two limits, one decision: read queue, rejected, and breaker.tripped, then move throughput up or down to hold 70–80% of write capacity.

Three metrics define the safe boundary; watch all three, not just throughput:

Heap pressure. Scroll contexts and segment merges compete for heap. Track indices.fielddata.memory_size_in_bytes and indices.segments.memory_in_bytes via GET _nodes/stats/indices. Sustained heap above 75% means GC is stealing the CPU you think is doing reindex work.
Thread pool saturation. The write pool on data and coordinating nodes rejects requests once its bounded queue fills. Watch GET _cat/thread_pool/write?v&h=node_name,active,queue,rejected — any non-zero rejected is dropped work you must retry.
Circuit breaker limits. The parent and request breakers default to 95% and 60% of heap respectively. A bulk payload over ~20 MB frequently trips request on the coordinating node before serialization even finishes. Validate headroom with GET _nodes/stats/breaker.

Configuration Reference

The parameters below are the levers you actually tune. Baselines assume 2–5 KB documents on an 8x data-node cluster with 30 GB heap per node; scale payload counts inversely with document size.

Parameter	Where it lives	Production baseline	Rationale
`chunk_size`	`helpers.bulk`	1,000–5,000 docs	Sized so the serialized batch lands in the 10–20 MB payload sweet spot
`max_chunk_bytes`	`helpers.bulk`	`15_000_000` (15 MB)	Hard cap that stops an oversized batch from tripping the `request` breaker on the coordinating node
`size`	`helpers.scan` query	1,000–2,000 docs	Per-fetch heap cost of the scroll; larger values increase deserialization pressure, not parallelism
`scroll`	`helpers.scan`	`"5m"`	Balances context retention against heap eviction; shorten if batch processing is fast
`requests_per_second`	`_reindex` API	10–20% below peak write throughput	Token-bucket throttle that keeps the write queue from saturating during concurrent ingest
`slices`	`_reindex` API	`auto` or `8+`	Parallelises the server-side copy across primary shards; not available on `helpers.scan`

A minimal write-optimised target that maximises the throughput ceiling before the copy begins:

PUT /events-2026-000002
{
  "settings": {
    "index.number_of_replicas": 0,
    "index.refresh_interval": "-1",
    "index.translog.durability": "async",
    "index.lifecycle.name": "events-ilm",
    "index.lifecycle.rollover_alias": "events-write"
  }
}

number_of_replicas: 0 and refresh_interval: -1 are the two settings that most raise the write ceiling during a reindex; both are reverted at cutover. The lifecycle settings must be present at creation time — an index created without them silently bypasses your retention rules, a form of drift covered under the parent workflow.

Step-by-Step Implementation

The _reindex API is the right tool when the copy is a straight document move with no application-side transformation, because its slices parameter parallelises across shards inside the deployment. Submit it throttled and asynchronously:

POST /_reindex?wait_for_completion=false&requests_per_second=2000&slices=auto
{
  "source": { "index": "events-2026-000001", "size": 2000 },
  "dest":   { "index": "events-2026-000002", "op_type": "create" },
  "conflicts": "proceed"
}

size: 2000 sets the scroll batch, slices: auto fans the copy across primary shards, op_type: "create" makes re-runs idempotent, and conflicts: "proceed" stops a single version clash from aborting the whole task. The exact per-request sizing tradeoffs are explored in depth in tuning bulk request size for high-throughput reindexing.

When you need document-level transformation, deduplication, or streaming into an external system, drive the copy from the client instead. The official v8 client provides helpers.scan for cursor-based reads and helpers.bulk for batched writes. The script below implements bounded chunking, partial-failure capture, and exponential backoff:

import logging
from elasticsearch import Elasticsearch, helpers

logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
logger = logging.getLogger(__name__)

def run_optimized_reindex(
    source_index: str,
    dest_index: str,
    client: Elasticsearch,
    chunk_size: int = 2500,
    max_chunk_bytes: int = 15_000_000,
):
    # helpers.scan opens ONE scroll cursor and does NOT accept `slices`.
    # For parallel reads, run several scans, each with a `slice` clause in the
    # query, or use the _reindex API's `slices` parameter instead.
    scroll_kwargs = {
        "index": source_index,
        "query": {"match_all": {}},
        "scroll": "5m",
        "size": chunk_size,        # per-fetch heap cost — keep it modest
        "request_timeout": 60,
    }

    def action_generator():
        for hit in helpers.scan(client, **scroll_kwargs):
            yield {
                "_op_type": "index",       # or "create" for strict dedupe
                "_index": dest_index,
                "_id": hit["_id"],
                "_source": hit["_source"],
            }

    try:
        success, errors = helpers.bulk(
            client,
            action_generator(),
            chunk_size=chunk_size,
            max_chunk_bytes=max_chunk_bytes,
            max_retries=3,
            initial_backoff=2,
            max_backoff=600,
            raise_on_error=False,   # capture per-doc failures instead of aborting
            raise_on_exception=True,
            stats_only=True,
        )
        logger.info("Reindex complete. Success: %s, Errors: %s", success, errors)
        return success, errors
    except Exception:
        logger.exception("Bulk ingestion failed")
        raise

raise_on_error=False is the load-bearing choice: it lets version conflicts and mapping-drift failures accumulate in the return value instead of killing a multi-hour job on document one. Conflict handling itself is a separate discipline — see resolving document conflicts during reindex. When the copy is one stage of a larger partitioned migration, the orchestration patterns in designing batch reindex workflows wrap this loop in a resumable state machine.

Verification

Confirm the copy is progressing and staying inside its limits, not just that it “started”. For an async _reindex, poll the task and read the live counters:

GET /_tasks/<node_id>:<task_id>

{
  "completed": false,
  "task": {
    "status": {
      "total": 48000000,
      "created": 12500000,
      "version_conflicts": 0,
      "requests_per_second": 2000.0,
      "throttled_millis": 41000
    }
  }
}

A rising throttled_millis confirms requests_per_second is actively pacing the job; a climbing version_conflicts with conflicts: "proceed" is tolerable, but a sudden spike signals concurrent writes to the source. In parallel, keep two health checks open:

GET _cat/thread_pool/write?v&h=node_name,active,queue,rejected
GET _nodes/stats/breaker

rejected must stay at 0; the request breaker’s tripped count must not increment. After cutover, confirm the target is genuinely under lifecycle management before decommissioning the source:

GET /events-2026-000002/_ilm/explain

An index reporting "managed": false copied documents but never adopted your policy — it will ignore rollover conditions and retention. Live progress instrumentation and alerting are covered under tracking reindex progress & performance.

Threshold Tuning & Performance Guidance

Static thresholds fail under variable cluster load. Tune iteratively against the real cluster rather than guessing from a spreadsheet:

Baseline. Run a 5-minute test with slices: 1 (or a single scan) and size: 1000. Record queue depth and rejected from _cat/thread_pool/write, and peak heap from _nodes/stats/breaker.
Scale parallelism. On the _reindex API, raise slices toward the source’s primary-shard count, capped near 8–12 per coordinating node to avoid context-switch overhead. helpers.scan is single-cursor, so parallelise it by running multiple sliced scans instead.
Calibrate the throttle. Set requests_per_second to observed_peak_throughput * 0.85. If queue consistently exceeds 500, cut it 15%. If rejected holds at 0 for ten minutes, raise it 10%.
Adapt the payload. Watch max_chunk_bytes against the parent breaker limit. If heap spikes above ~65%, drop max_chunk_bytes to 10 MB and raise chunk_size slightly to offset the per-batch overhead.

Throughput is a plateau, not a slope: batches below the band pay fixed per-request overhead, batches above it trip the request breaker. Cap the bytes with max_chunk_bytes, not the document count.

On shard sizing: aim for 30–50 GB per primary shard on the target — the Lucene sweet spot that keeps merges and recovery cheap. Under-sharding forces oversized segments and long merges that steal heap from the copy; over-sharding multiplies the fixed per-shard overhead the write pool must service. Because the target’s shard count is fixed at creation, this is the one decision you cannot retune mid-flight, and it often is the entire reason the reindex exists. Tier placement follows the same routing rules as the hot-warm-cold architecture; reindex into the hot tier, then let ILM migrate the result.

Troubleshooting

`circuit_breaking_exception` on the coordinating node

Symptom: 429 or 503 with reason: "Data too large, data for [<http_request>] would be larger than limit". Resolution:

Reduce max_chunk_bytes to 10_000_000.
Verify indices.breaker.request.limit has not been lowered in elasticsearch.yml.
Hunt for oversized documents. Sorting by _size (GET source_index/_search?size=1&sort=_size:desc) requires the mapper-size plugin with _size enabled in the mapping; without it, sample document sizes application-side and exclude outliers with a query filter or pre-process them.

Scroll context OOM / `search_phase_execution_exception`

Symptom: OutOfMemoryError in elasticsearch.log, or open scroll_contexts exceeding max_open_scroll_context (default 500). Resolution:

Shorten scroll to "2m" so abandoned contexts expire sooner.
Reduce size in helpers.scan to lower per-fetch heap — a larger size raises memory pressure and does not reduce context count, since each scan uses exactly one.
Cut the number of concurrent scans to stay under max_open_scroll_context, and clear stale contexts with DELETE _search/scroll/_all (use cautiously in production).

Partial failures & document conflicts

Symptom: helpers.bulk returns a non-zero error count with version_conflict_engine_exception or mapper_parsing_exception. Resolution:

Keep raise_on_error=False so failures are captured, not fatal.
Inspect the errors list for _id collisions. Use "op_type": "create" for strict deduplication or "index" for overwrites, depending on migration intent.
Validate destination mappings before execution; schema drift is the usual root cause.

Thread pool rejections under load

Symptom: _cat/thread_pool/write shows rejected > 0 with queue at capacity. Resolution:

Lower requests_per_second by 20%.
Raise initial_backoff to 5 and max_backoff to 1200 in helpers.bulk to give the queue time to drain.
If rejections persist, scale coordinating nodes horizontally or route reindex traffic to a dedicated ingest node pool.

FAQ

What is the ideal bulk payload size for reindexing?

Target a serialized batch of 10–20 MB, not a fixed document count. With 2–5 KB documents that lands around 2,000–5,000 docs per chunk_size; with 40 KB documents the same byte budget means only a few hundred. Above ~20 MB you risk tripping the request circuit breaker on the coordinating node before serialization completes, so cap the byte size explicitly with max_chunk_bytes and let it, rather than document count, be the hard limit.

Does raising `size` on `helpers.scan` make the reindex parallel?

No. helpers.scan opens a single scroll cursor regardless of size; a larger size only increases the per-fetch heap cost. Parallelism comes from slices on the _reindex API, or from running several helpers.scan calls that each carry a slice clause in the query. Conflating the two is the most common reason a “tuned” client-side reindex stays slow while heap pressure climbs.

How do I set `requests_per_second` without guessing?

Baseline sustained cluster write throughput (docs/sec) under peak query load, then start at 85% of that figure. Watch _cat/thread_pool/write: if queue consistently exceeds 500, cut 15%; if rejected stays at 0 for ten minutes, add 10%. The goal is steady-state occupancy around 70–80% of write capacity, leaving headroom for concurrent search. On a running _reindex you can adjust live with POST /_reindex/<task_id>/_rethrottle — no cancel-and-resubmit needed.

Why does throughput drop even though CPU is not maxed out?

The usual culprit is heap: scroll contexts and segment merges are consuming heap faster than GC can reclaim it, so the JVM burns cycles collecting rather than copying. Check _nodes/stats/indices for rising segments.memory_in_bytes and _nodes/stats/breaker for a climbing parent breaker. Lower max_chunk_bytes, shorten scroll, and confirm the target has refresh_interval: -1 set so it is not generating a segment per batch.

Should I disable replicas during a reindex?

Yes, for the duration of the copy. Setting number_of_replicas: 0 on the target removes the per-write replication cost and is one of the two largest throughput wins (the other being refresh_interval: -1). Restore the production replica count and refresh interval as part of the cutover step, before promoting the write alias, so the index is fully redundant the moment it starts serving traffic.

Designing batch reindex workflows — wrapping the throttled copy in a resumable, partitioned state machine.
Resolving document conflicts during reindex — version-aware and script-based handling of the conflicts that partial failures surface.
Tracking reindex progress & performance — reading the _tasks API and wiring alerts around the metrics tuned here.
Cache warming strategies for new indices — eliminating cold-start latency once the tuned copy cuts over.
Tuning bulk request size for high-throughput reindexing — the per-request sizing deep dive under this topic.

← Back to Automated Reindexing Pipelines & Workflows

Optimizing Reindex Thresholds & Bulk Sizes #

Prerequisites #

Architecture: The Throttle Feedback Loop #

Configuration Reference #

Step-by-Step Implementation #

Verification #

Threshold Tuning & Performance Guidance #

Troubleshooting #

circuit_breaking_exception on the coordinating node #

Scroll context OOM / search_phase_execution_exception #

Partial failures & document conflicts #

Thread pool rejections under load #

FAQ #

Related #

Explore deeper

Related in Reindexing Pipelines