How to Configure Rollover Based on Max Primary Shard Size

In high-throughput log analytics and search pipelines, relying exclusively on document count or fixed time windows for index rotation produces pathological shard distributions. When primary shards exceed optimal capacity (typically 30–50 GB), cluster routing overhead, segment merge latency, and recovery times degrade non-linearly. Configuring Index Lifecycle Management (ILM) to trigger rollover based on max_primary_shard_size provides deterministic, storage-aware index rotation. This methodology integrates directly into the broader Elasticsearch ILM Architecture & Fundamentals framework, ensuring predictable data lifecycle transitions without manual intervention or capacity guesswork.

flowchart TD
  A["GET index/_ilm/explain"] --> B{"step value"}
  B -->|"check-rollover-ready"| C["Healthy: waiting for size or age"]
  B -->|"ERROR"| D["Inspect step_info, fix root cause, retry"]

1. Policy Definition & Storage-Aware Rotation

The max_primary_shard_size condition evaluates the largest primary shard in the current write index. Unlike max_size, which aggregates storage across all shard replicas, this metric prevents a single oversized primary from becoming a routing bottleneck or merge hotspot. The following policy definition is production-ready and aligns with standard tiered storage strategies:

PUT _ilm/policy/log_pipeline_rollover
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_primary_shard_size": "45gb",
            "max_age": "7d"
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          },
          "forcemerge": {
            "max_num_segments": 1
          },
          "set_priority": {
            "priority": 50
          }
        }
      }
    }
  }
}

This configuration enforces strict boundary conditions. When the largest primary shard reaches 45 GB, ILM triggers a rollover regardless of document count or elapsed time. For teams operating across heterogeneous node classes, this approach directly supports the resource isolation principles documented in Understanding Hot-Warm-Cold Architecture.

2. Bootstrap Index Initialization & Alias Enforcement

ILM requires explicit lifecycle metadata and a designated write alias to evaluate rollover conditions. Without index.lifecycle.rollover_alias pointing to the active write target, ILM will silently skip evaluation. Initialize the bootstrap index with strict alias mapping:

PUT logs-000001
{
  "aliases": {
    "logs-write": {
      "is_write_index": true
    }
  },
  "settings": {
    "index.lifecycle.name": "log_pipeline_rollover",
    "index.lifecycle.rollover_alias": "logs-write",
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

Verify alias resolution immediately after creation:

GET _alias/logs-write

Expected output must return logs-000001 with "is_write_index": true. Any deviation indicates misconfiguration and will block automated rotation.

3. Cluster State Diagnostics & Stuck-State Resolution

When max_primary_shard_size fails to trigger, or the policy stalls in the hot phase, systematic diagnosis is mandatory. Execute the lifecycle explain API to isolate the failure vector:

GET logs-000001/_ilm/explain

Healthy State Output:

{
  "indices": {
    "logs-000001": {
      "index": "logs-000001",
      "managed": true,
      "policy": "log_pipeline_rollover",
      "phase": "hot",
      "action": "rollover",
      "step": "check-rollover-ready",
      "step_info": {
        "message": "Waiting for index to meet rollover conditions"
      }
    }
  }
}

Stuck State Output (Alias Misalignment):

{
  "indices": {
    "logs-000001": {
      "managed": true,
      "phase": "hot",
      "action": "rollover",
      "step": "ERROR",
      "step_info": {
        "type": "illegal_argument_exception",
        "reason": "index.lifecycle.rollover_alias [logs-write] does not point to index [logs-000001]"
      }
    }
  }
}

If step reports ERROR or WAITING indefinitely, cross-reference cluster allocation status:

GET _cluster/health?level=indices&pretty

If status returns yellow or red alongside unassigned_shards > 0, disk watermarks are likely breached. Verify watermark thresholds:

GET _cluster/settings?include_defaults=true&filter_path=*.cluster.routing.allocation.disk

Safe Manual Reroute Protocol: Do not force-allocate primaries without confirming data integrity. If replicas are stuck due to transient disk pressure, execute a controlled replica allocation:

POST _cluster/reroute
{
  "commands": [
    {
      "allocate_replica": {
        "index": "logs-000001",
        "shard": 0,
        "node": "data-node-02"
      }
    }
  ]
}

For ILM stuck on check-rollover-ready despite meeting size thresholds, bypass the automation safely via the rollover API:

POST logs-write/_rollover
{
  "conditions": {
    "max_primary_shard_size": "45gb"
  }
}

Reference the official cluster routing allocation documentation for watermark tuning parameters: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cluster.html#disk-based-shard-allocation

4. Automated Python v8+ Recovery Execution

Manual intervention introduces latency and audit gaps. Deploy the following Python v8+ recovery script to automate stuck-state detection, forced rollover, and compliance logging. Requires elasticsearch>=8.0.0.

import logging
from elasticsearch import Elasticsearch, exceptions

logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")

class ILMRecoveryEngine:
    def __init__(self, hosts: list[str], api_key: str):
        self.es = Elasticsearch(hosts=hosts, api_key=api_key, verify_certs=True)
        self.logger = logging.getLogger(__name__)

    def diagnose_stuck_indices(self, index_pattern: str = "logs-*") -> list[str]:
        # explain_lifecycle is scoped by index (there is no `policy` parameter).
        response = self.es.ilm.explain_lifecycle(index=index_pattern, human=True)
        stuck = []
        for idx, meta in response["indices"].items():
            # Only indices halted in the ERROR step are genuinely stuck; an index
            # sitting in hot/rollover/check-rollover-ready is the healthy steady state.
            if meta.get("step") == "ERROR":
                stuck.append(idx)
        return stuck

    def force_safe_rollover(self, alias: str, max_size: str = "45gb"):
        try:
            self.logger.info(f"Executing manual rollover for alias: {alias}")
            resp = self.es.indices.rollover(alias=alias, body={"conditions": {"max_primary_shard_size": max_size}})
            new_index = resp.get("old_index", "unknown")
            self.logger.info(f"Rollover successful. Previous index: {new_index}")
            return True
        except exceptions.RequestError as e:
            self.logger.error(f"Rollover failed: {e.error}")
            return False

    def verify_cluster_health(self) -> bool:
        health = self.es.cluster.health(level="indices")
        status = health.get("status", "unknown")
        self.logger.info(f"Cluster status: {status}. Unassigned shards: {health.get('unassigned_shards', 0)}")
        return status in ("green", "yellow")

if __name__ == "__main__":
    engine = ILMRecoveryEngine(
        hosts=["https://es-cluster-01:9200"],
        api_key="YOUR_BASE64_ENCODED_API_KEY"
    )
    
    stuck_indices = engine.diagnose_stuck_indices("logs-*")
    if stuck_indices:
        engine.force_safe_rollover("logs-write", "45gb")
        engine.verify_cluster_health()
    else:
        logging.info("No stuck indices detected. ILM operating within parameters.")

For API reference and client configuration standards, consult the official Python Elasticsearch client documentation: https://elasticsearch-py.readthedocs.io/en/v8.13.0/

5. Compliance & Escalation Protocols

Automated ILM execution does not replace operational oversight. Implement the following compliance controls:

  1. Audit Trail Enforcement: All _ilm/explain and _cluster/reroute executions must be logged to a centralized SIEM. Retain logs for a minimum of 90 days for forensic reconstruction.
  2. Threshold Alerting: Configure alerts on cluster.routing.allocation.disk.watermark.flood_stage and ilm.step: ERROR. Page on-call engineers immediately if flood stage persists beyond 15 minutes.
  3. Escalation Path: If forced rollover fails and shard allocation remains blocked, isolate the affected node via cluster.routing.allocation.exclude._ip, drain traffic, and initiate a controlled reindex to a fresh index template. Do not attempt manual shard relocation without data node verification.
  4. Policy Versioning: Store all ILM policy definitions in version-controlled infrastructure-as-code repositories. Rollbacks require explicit approval from the platform architecture team.

Strict adherence to these diagnostic and recovery patterns guarantees deterministic index rotation, prevents storage exhaustion, and maintains cluster routing stability under sustained ingestion loads.