Fallback Routing for Data Retention

An Index Lifecycle Management (ILM) policy assumes the tier it wants to move data onto will actually be there. Production rarely cooperates: a warm node is evicted mid-upgrade, a disk crosses its high watermark, a data-tier role is fat-fingered out of elasticsearch.yml, and the allocate action stalls with nowhere to put the shard. When that happens without a fallback path, the index freezes in an intermediate state — retention windows drift, primary shards keep growing on the hot tier, and a compliance clock keeps ticking on data that was supposed to have migrated. Fallback routing is the pattern that turns that hard stop into a graceful degradation: when the preferred tier is unreachable, the allocator places the shard on the next tier in an ordered preference list rather than leaving it unassigned.

The mechanism is not another allocation filter — it is the ordered index.routing.allocation.include._tier_preference setting, a comma-separated list of data tiers the allocator evaluates left to right. This page covers how that list interacts with node data-tier roles, how to wire it into both an ILM policy and an index template, how to force a fallback during an incident, and how to verify the shard actually landed where you intended. It builds directly on the tier topology defined by the hot-warm-cold architecture and on the rollover conditions that hand an index into the phases where this routing matters.

Prerequisites

Elasticsearch 8.x cluster with data-tier node roles assigned (data_hot, data_warm, data_cold) — fallback routing has nothing to fall back to unless at least two tiers carry live nodes.
elasticsearch-py v8.0+ installed — this page uses the v8 client surface (ilm.put_lifecycle, indices.put_settings, cluster.allocation_explain, keyword-argument request bodies, typed ApiError), not the legacy body= pattern.
An index template that sets index.lifecycle.name and index.routing.allocation.include._tier_preference, so every backing index inherits the fallback list at creation time rather than being patched after the fact.
Disk watermarks reviewed (cluster.routing.allocation.disk.watermark.*) — a fallback target that is already past its high watermark will reject the shard and the fallback still fails.
Monitoring access to GET _ilm/explain, POST _cluster/allocation/explain, and GET _cat/nodes to confirm the allocator’s decision, not just the policy text.
manage_ilm, manage_index_templates, and cluster-level manage privileges on the automation account, scoped per Securing ILM Policies with RBAC.

Architecture: How Fallback Fits the Lifecycle State Machine

ILM is best understood not as a rigid pipeline but as a state machine with explicit failure branches. Each phase transition asks the allocator a question — “place this shard on a data_warm node” — and the allocator answers through a chain of deciders. The three allocation filters ILM exposes on the allocate action — require, include, and exclude — are all hard constraints: require means only nodes matching this attribute, and if none exist the shard stays unassigned indefinitely. None of them expresses “prefer warm, but accept hot if warm is gone.”

That ordered preference is what index.routing.allocation.include._tier_preference provides. Given data_warm,data_hot, the allocator tries to satisfy data_warm first and only falls through to data_hot when no warm node can accept the shard — because every warm node is offline, past its high watermark, or filtered out. The preference list is the single point where ideal topology and runtime reality are reconciled.

The list must mirror physical topology. Each node declares its tier through node.roles (for example [data_warm]), and the allocator matches those roles against the preference list during placement. Fallback routing bridges the gap between the topology you designed and the one you have at 3 a.m. during a rolling restart — letting the lifecycle degrade one tier at a time instead of halting outright.

Configuration Reference

Two artifacts carry the fallback contract: the ILM policy, which sets a preference on the allocate action per phase, and the index template, which sets a baseline preference every managed index inherits at creation. Keep both in sync — a template that says data_warm,data_hot while the warm phase’s allocate action says only require: data_warm will still deadlock the moment warm nodes vanish.

ILM Policy with Per-Phase Fallback

PUT _ilm/policy/logs-retention-policy
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_primary_shard_size": "50gb",
            "max_age": "7d"
          }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "allocate": {
            "include": { "_tier_preference": "data_warm,data_hot" },
            "number_of_replicas": 1
          },
          "shrink": { "number_of_shards": 1 }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "allocate": { "include": { "_tier_preference": "data_cold,data_warm" } }
        }
      }
    }
  }
}

Each phase’s allocate action names its own ordered list: warm prefers data_warm then degrades to data_hot; cold prefers data_cold then degrades to data_warm. The order encodes an operational policy — always fall back toward hotter, more available tiers, never toward a colder tier that may not have the capacity or query profile the data still needs. The shrink action in the warm phase reduces the shard count before migration; it runs only after allocation succeeds, so a stalled allocate step blocks shrink too.

Index Template with a Baseline Preference

PUT _index_template/logs-app-fallback
{
  "index_patterns": ["logs-app-*"],
  "template": {
    "settings": {
      "index.routing.allocation.include._tier_preference": "data_warm,data_hot",
      "index.lifecycle.name": "logs-retention-policy",
      "index.lifecycle.rollover_alias": "logs-app"
    }
  }
}

Setting _tier_preference at the template level means every logs-app-* index is born with a fallback path — the allocator never has to guess, and there is no window where a freshly rolled-over index has no preference at all. The index.lifecycle.name binds the policy above; index.lifecycle.rollover_alias ties the template into the write alias the rollover configuration advances. Authoring the policy-and-template pair as one unit is covered in depth in building custom ILM policies via the API.

Step-by-Step Implementation

Applying and validating fallback routing by hand invites configuration drift between environments. The elasticsearch Python client v8.x wraps the ILM and allocation APIs with typed methods, so the same script deploys the policy, checks the live allocation decision, and — when an incident demands it — widens the tier preference non-destructively.

import logging
from elasticsearch import Elasticsearch, ApiError

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


class ILMFallbackManager:
    def __init__(self, hosts: list[str], api_key: str):
        # v8 client: api_key auth, TLS verification on, bounded request timeout.
        self.es = Elasticsearch(
            hosts=hosts,
            api_key=api_key,
            verify_certs=True,
            request_timeout=30,
        )

    def deploy_policy(self, name: str, policy: dict) -> bool:
        """Apply (or update) the retention policy carrying the fallback lists."""
        try:
            self.es.ilm.put_lifecycle(name=name, policy=policy)
            logger.info("ILM policy '%s' deployed.", name)
            return True
        except ApiError as exc:
            logger.error("Failed to deploy policy: %s", exc.info)
            return False

    def verify_allocation_routing(self, index_pattern: str) -> dict:
        """Ask the allocator, per index, whether the shard can be placed and why."""
        try:
            indices = self.es.cat.indices(
                index=index_pattern, format="json",
                h="index,health,status,pri,rep,store.size",
            )
            results = {}
            for idx in indices:
                explain = self.es.cluster.allocation_explain(
                    index=idx["index"], shard=0, primary=True,
                    include_disk_info=True,
                )
                results[idx["index"]] = {
                    "current_node": explain.get("current_node"),
                    "can_allocate": explain.get("can_allocate"),
                    "deciders": explain.get("node_allocation_decisions", []),
                }
            return results
        except ApiError as exc:
            logger.error("Allocation check failed: %s", exc.info)
            return {}

    def force_fallback_reroute(self, index: str,
                               tier_preference: str = "data_hot,data_warm") -> None:
        """Widen an index's tier preference so a stalled shard can land on
        another tier. Non-destructive: unlike allocate_stale_primary, it never
        risks data loss — it only relaxes where the allocator may place a copy."""
        try:
            self.es.indices.put_settings(
                index=index,
                settings={
                    "index.routing.allocation.include._tier_preference": tier_preference
                },
            )
            # Retry shards that previously exhausted their allocation attempts.
            self.es.cluster.reroute(retry_failed=True)
            logger.info("Applied fallback preference '%s' to %s", tier_preference, index)
        except ApiError as exc:
            logger.error("Fallback reroute failed: %s", exc.info)


if __name__ == "__main__":
    manager = ILMFallbackManager(["https://es-cluster:9200"], "YOUR_API_KEY")
    manager.deploy_policy("logs-retention-policy", {
        "phases": {
            "hot": {"min_age": "0ms",
                    "actions": {"rollover": {"max_primary_shard_size": "50gb"}}},
            "warm": {"min_age": "7d",
                     "actions": {"allocate": {"include": {"_tier_preference": "data_warm,data_hot"}}}},
        }
    })
    logger.info(manager.verify_allocation_routing("logs-app-*"))

The three methods map to the operational lifecycle: deploy_policy puts the fallback contract in place, verify_allocation_routing reads the allocator’s live decision per index, and force_fallback_reroute is the break-glass override for an active incident. cluster.reroute(retry_failed=True) matters because Elasticsearch stops retrying a shard after index.allocation.max_retries (default 5) consecutive failures — without the explicit retry, widening the preference alone will not move a shard that has already exhausted its attempts. Programmatic recovery of stuck steps is treated more fully in handling ILM step execution failures programmatically.

Verification

Confirm the allocator’s behaviour, not just that the policy text applied. Start by finding out where ILM believes each index sits and whether a step has errored:

GET logs-app-2024.01.01/_ilm/explain

Inspect step_info. If it reports allocation_failed or no_valid_shard_copy, the allocate action is the blocker and the next call tells you why:

POST _cluster/allocation/explain
{
  "index": "logs-app-2024.01.01",
  "shard": 0,
  "primary": true
}

Read the node_allocation_decisions and the deciders array. The blockers that fallback routing is meant to survive show up here by name:

data_tier — no node carries a role matching the current entry in the _tier_preference list; the allocator should be trying the next entry.
disk_threshold — the candidate node is past its high watermark, so even a valid tier is refused.
same_shard — a replica cannot sit on the same node as its primary, which can look like a tier problem on a small cluster.

Then confirm the tiers you expect to fall back to are actually live:

GET _cat/nodes?v&h=name,node.role

A preference list that names data_warm,data_hot is worthless if no node reports a w or h role — a tier with zero live nodes causes a silent allocation failure that no amount of retrying fixes. Finally, watch shard placement settle after a fallback fires:

GET _cat/shards/logs-app-*?v&h=index,shard,prirep,state,node&s=state

Any shard in UNASSIGNED after the preference was widened and reroute(retry_failed=True) was called means the fallback target itself is refusing the shard — loop back to the allocation explain and read the decider for that specific node.

Threshold Tuning & Performance Guidance

Fallback routing only functions if disk watermarks do not veto the fallback target. Elasticsearch evaluates three thresholds in order: low (stop allocating new shards to this node), high (start relocating shards off this node), and flood_stage (force every index with a shard on this node to read-only). A fallback that routes warm data onto hot nodes can push those hot nodes toward flood_stage precisely when the deployment is already degraded — the failure you were avoiding reappears one tier over.

PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low": "85%",
    "cluster.routing.allocation.disk.watermark.high": "90%",
    "cluster.routing.allocation.disk.watermark.flood_stage": "95%",
    "cluster.routing.allocation.disk.threshold_enabled": true
  }
}

Size the fallback headroom deliberately. If warm data is allowed to fall back onto the hot tier, the hot tier must carry enough free space to absorb that warm working set plus its normal ingest burst — otherwise the fallback trades an unassigned shard for a read-only index. Track fs disk usage through GET _nodes/stats/fs and alert at 80%, below the low watermark, so capacity is added or data purged before the allocator ever refuses a shard.

Two further constraints shape how much a tier can accept:

Shard count against heap. Every open shard on a fallback node carries a fixed heap cost for segment metadata and field-data structures. Absorbing another tier’s shards inflates the receiving node’s shard count; keep total shards per node under roughly 20 per GB of configured JVM heap, and remember that a fallback event raises that ratio precisely when heap is already under pressure.
shrink ordering. In the warm phase, shrink runs after allocate. If the shard falls back to data_hot, shrink still executes there — reducing shard count on the hot tier, which is usually fine, but it means the eventual relocation back to warm moves fewer, larger shards. Plan the size cap set by rollover conditions with that post-shrink shard size in mind.

Troubleshooting

When an index stalls in the allocate action, work the following sequence — every step maps to a first-party API, no third-party tooling required.

ILM stuck in `allocate` with `allocation_failed`

Symptom: GET <index>/_ilm/explain shows action: allocate and a step_info.error of allocation_failed, and the index never advances to shrink or the next phase. Resolution:

Run POST _cluster/allocation/explain for the primary and read the deciding node’s decider.
If the decider is data_tier, confirm with GET _cat/nodes?v&h=name,node.role that a node exists for some entry in the preference list.

Widen the preference toward an available tier and retry failed shards:

PUT logs-app-2024.01.01/_settings
{
  "index.routing.allocation.include._tier_preference": "data_hot,data_warm"
}

POST _cluster/reroute?retry_failed=true

The shard exhausted its retry budget

Symptom: the preference is correct and a valid tier exists, yet the shard stays UNASSIGNED. allocation/explain reports the shard has reached max_retries. Resolution: Elasticsearch stops retrying after five failed attempts. A settings change alone will not restart it — you must issue POST _cluster/reroute?retry_failed=true (or the client’s cluster.reroute(retry_failed=True)) to reset the counter and let the allocator try the newly valid target.

Emergency override, then hand back to ILM

Symptom: the index has been stuck in allocate for hours and retention SLAs are at risk. Resolution: widen the preference as above to force relocation, wait for the shards to relocate and the index health to return to green, then hand control back to the policy so it resumes the phase from where it stalled:

POST logs-app-2024.01.01/_ilm/retry

Once resumed, restore the template’s original preference so the manual widening does not persist as permanent drift. Recurring stalls that are not tier-availability problems — for example a phase waiting on min_age clock skew — are diagnosed in troubleshooting ILM phase transition delays.

Confirm resumption

GET _ilm/status
GET logs-app-2024.01.01/_ilm/explain

GET _ilm/status must report RUNNING — if a prior incident called POST _ilm/stop, ILM is paused globally and no policy advances until POST _ilm/start. The per-index explain should then show the phase moving to warm and the action clearing the failed allocate step.

FAQ

Does `_tier_preference` replace the `require`/`include`/`exclude` allocation filters?

No — it complements them. require, include, and exclude are hard constraints evaluated against node attributes, and require in particular leaves a shard unassigned if no node matches. index.routing.allocation.include._tier_preference is the only setting that expresses an ordered fallback across data tiers. Use the filters for absolute placement rules (for example, keep a tenant off shared nodes) and _tier_preference for graceful tier degradation. If both point at incompatible targets, the hard filter wins and the shard can still deadlock.

Why does my shard stay `UNASSIGNED` even after I widen the tier preference?

Almost always the retry budget. Elasticsearch stops retrying a shard after index.allocation.max_retries (default 5) consecutive allocation failures, and a settings change does not reset that counter. Issue POST _cluster/reroute?retry_failed=true after widening the preference. If it still will not move, run POST _cluster/allocation/explain and read the decider for the target node — a disk_threshold decision means the fallback tier is itself past its high watermark.

Should fallback lists ever degrade toward a colder tier instead of a hotter one?

Generally no. Falling back toward hotter tiers (data_cold,data_warm, data_warm,data_hot) degrades toward tiers with more capacity headroom and better query performance, which is what you want during an incident. Falling back toward a colder tier risks placing actively queried data on nodes provisioned for archival throughput, and cold tiers often use searchable snapshots that cannot host a normal writable shard at all. Order lists from the intended tier outward toward hotter, more capable nodes.

Does a fallback event change the index's ILM phase?

No. Phase is driven by min_age and policy state, not by which physical tier currently holds the shards. An index in the warm phase whose shards fell back onto hot nodes is still, to ILM, in the warm phase — it has simply satisfied its allocate step on a non-preferred tier. When warm nodes return and the preference is restored, normal shard rebalancing relocates the data without a phase transition.

Is `force_fallback_reroute` safe to run against a production index?

Yes. It only relaxes where the allocator is permitted to place a shard copy and then asks for a retry — it never promotes a stale copy or discards data, unlike allocate_stale_primary or allocate_empty_primary, which can lose writes. The one caveat is that the widened preference persists as an index setting until you revert it, so pair any emergency use with a follow-up that restores the template’s original list to avoid silent configuration drift.

Understanding hot-warm-cold architecture — the node roles and tier topology the fallback preference list has to mirror.
Configuring index rollover conditions — the size cap and write alias that set the shard size a fallback tier must absorb.
Securing ILM policies with RBAC — the privileges an automation account needs to change allocation settings and reroute shards.
Troubleshooting ILM phase transition delays — diagnosing stalls that are timing-driven rather than tier-availability driven.
Handling ILM step execution failures programmatically — automating detection and retry of a stuck allocate step.

← Back to ILM Architecture & Fundamentals

Fallback Routing for Data Retention #

Prerequisites #

Architecture: How Fallback Fits the Lifecycle State Machine #

Configuration Reference #

ILM Policy with Per-Phase Fallback #

Index Template with a Baseline Preference #

Step-by-Step Implementation #

Verification #

Threshold Tuning & Performance Guidance #

Troubleshooting #

ILM stuck in allocate with allocation_failed #

The shard exhausted its retry budget #

Emergency override, then hand back to ILM #

Confirm resumption #

FAQ #

Related #

Related in ILM Architecture