telemt/docs/model/MODEL.en.md

11 KiB

Telemt Runtime Model

Scope

This document defines runtime concepts used by the Middle-End (ME) transport pipeline and the orchestration logic around it.

It focuses on:

  • ME Pool / Reader / Writer / Refill / Registry
  • Adaptive Floor
  • Trio-State
  • Generation Lifecycle

Core Entities

ME Pool

ME Pool is the runtime orchestrator for all Middle-End writers.

Responsibilities:

  • Holds writer inventory by DC/family/endpoint.
  • Maintains routing primitives and writer selection policy.
  • Tracks generation state (active, warm, draining context).
  • Applies runtime policies (floor mode, refill, reconnect, reinit, fallback behavior).
  • Exposes readiness gates used by admission logic (for conditional accept/cast behavior).

Non-goals:

  • It does not own client protocol decoding.
  • It does not own per-client business policy (quotas/limits).

ME Writer

ME Writer is a long-lived ME RPC tunnel bound to one concrete ME endpoint (ip:port), with:

  • Outbound command channel (send path).
  • Associated reader loop (inbound path).
  • Health/degraded flags.
  • Contour/state and generation metadata.

A writer is the actual data plane carrier for client sessions once bound.

ME Reader

ME Reader is the inbound parser/dispatcher for one writer:

  • Reads/decrypts ME RPC frames.
  • Validates sequence/checksum.
  • Routes payloads to client-connection channels via Registry.
  • Emits close/ack/data events and updates telemetry.

Design intent:

  • Reader must stay non-blocking as much as possible.
  • Backpressure on a single client route must not stall the whole writer stream.

Refill

Refill is the recovery mechanism that restores writer coverage when capacity drops:

  • Per-endpoint restore (same endpoint first).
  • Per-DC restore to satisfy required floor.
  • Optional outage-mode/shadow behavior for fragile single-endpoint DCs.

Refill works asynchronously and should not block hot routing paths.

Registry

Registry is the routing index between ME and client sessions:

  • conn_id -> client response channel
  • conn_id <-> writer_id binding map
  • writer activity snapshots and idle tracking

Main invariants:

  • A conn_id routes to at most one active response channel.
  • Writer loss triggers safe unbind/cleanup and close propagation.
  • Registry state is the source of truth for active ME-bound session mapping.

Adaptive Floor

What it is

Adaptive Floor is a runtime policy that changes target writer count per DC based on observed activity, instead of always holding static peak floor.

Why it exists

Goals:

  • Reduce idle writer churn under low traffic.
  • Keep enough warm capacity to avoid client-visible stalls on burst recovery.
  • Limit needless reconnect storms on unstable endpoints.

Behavioral model

  • Under activity: floor converges toward configured static requirement.
  • Under prolonged idle: floor can shrink to a safe minimum.
  • Recovery/grace windows prevent aggressive oscillation.

Safety constraints

  • Never violate minimal survivability floor for a DC group.
  • Refill must still restore quickly on demand.
  • Floor adaptation must not force-drop already bound healthy sessions.

Trio-State

Trio-State is writer contouring:

  • Warm
  • Active
  • Draining

State semantics

  • Warm: connected and validated, not primary for new binds.
  • Active: preferred for new binds and normal traffic.
  • Draining: no new regular binds; existing sessions continue until graceful retirement rules apply.

Transition intent

  • Warm -> Active: when coverage/readiness conditions are satisfied.
  • Active -> Draining: on generation swap, endpoint replacement, or controlled retirement.
  • Draining -> removed: after drain TTL/force-close policy (or when naturally empty).

This separation reduces SPOF and keeps cutovers predictable.

Generation Lifecycle

Generation isolates pool epochs during reinit/reconfiguration.

Lifecycle phases

  1. Bootstrap: initial writers are established.
  2. Warmup: next generation writers are created and validated.
  3. Activation: generation promoted to active when coverage gate passes.
  4. Drain: previous generation becomes draining, existing sessions are allowed to finish.
  5. Retire: old generation writers are removed after graceful rules.

Operational guarantees

  • No partial generation activation without minimum coverage.
  • Existing healthy client sessions should not be dropped just because a new generation appears.
  • Draining generation exists to absorb in-flight traffic during swap.

Readiness and admission

Pool readiness is not equivalent to “all endpoints fully saturated”. Typical gating strategy:

  • Open admission when per-DC minimal alive coverage exists.
  • Continue background saturation for multi-endpoint DCs.

This keeps startup latency low while preserving eventual full capacity.

Interactions Between Concepts

  • Generation defines pool epochs.
  • Trio-State defines per-writer role inside/around those epochs.
  • Adaptive Floor defines how much capacity should be maintained right now.
  • Refill is the actuator that closes the gap between desired and current capacity.
  • Registry keeps per-session routing correctness while all of the above changes over time.

Architectural Approach

Layered Design

The runtime is intentionally split into two planes:

  • Control Plane: decides desired topology and policy (floor, generation swap, refill, fallback).
  • Data Plane: executes packet/session transport (reader, writer, routing, acks, close propagation).

Architectural rule:

  • Control Plane may change writer inventory and policy.
  • Data Plane must remain stable and low-latency while those changes happen.

Ownership Model

Ownership is centered around explicit state domains:

  • MePool owns writer lifecycle and policy state.
  • Registry owns per-connection routing bindings.
  • Writer task owns outbound ME socket send progression.
  • Reader task owns inbound ME socket parsing and event dispatch.

This prevents accidental cross-layer mutation and keeps invariants local.

Control Plane Responsibilities

Control Plane is event-driven and policy-driven:

  • Startup initialization and readiness gates.
  • Runtime reinit (periodic or config-triggered).
  • Coverage checks per DC/family/endpoint group.
  • Floor enforcement (static/adaptive).
  • Refill scheduling and retry orchestration.
  • Generation transition (warm -> active, previous active -> draining).

Control Plane must prioritize determinism over short-term aggressiveness.

Data Plane Responsibilities

Data Plane is throughput-first and allocation-sensitive:

  • Session bind to writer.
  • Per-frame parsing/validation and dispatch.
  • Ack and close signal propagation.
  • Route drop behavior under missing connection or closed channel.
  • Minimal critical logging in hot path.

Data Plane should avoid waiting on operations that are not strictly required for frame correctness.

Concurrency and Synchronization

Concurrency Principles

  • Per-writer isolation: each writer has independent send/read task loops.
  • Per-connection isolation: client channel state is scoped by conn_id.
  • Asynchronous recovery: refill/reconnect runs outside the packet hot path.

Synchronization Strategy

  • Shared maps use fine-grained, short-lived locking.
  • Read-mostly paths avoid broad write-lock windows.
  • Backpressure decisions are localized at route/channel boundary.

Design target:

  • A slow consumer should degrade only itself (or its route), not global writer progress.

Cancellation and Shutdown

Writer and reader loops are cancellation-aware:

  • explicit cancel token / close command support;
  • safe unbind and cleanup via registry;
  • deterministic order: stop admission -> drain/close -> release resources.

Consistency Model

Session Consistency

For one conn_id:

  • exactly one active route target at a time;
  • close and unbind must be idempotent;
  • writer loss must not leave dangling bindings.

Generation Consistency

Generational consistency guarantees:

  • New generation is not promoted before minimum coverage gate.
  • Previous generation remains available in draining state during handover.
  • Forced retirement is policy-bound (drain ttl, optional force-close), not immediate.

Policy Consistency

Policy changes (adaptive/static floor, fallback mode, retries) should apply without violating established active-session routing invariants.

Backpressure and Flow Control

Route-Level Backpressure

Route channels are bounded by design. When pressure increases:

  • short burst absorption is allowed;
  • prolonged congestion triggers controlled drop semantics;
  • drop accounting is explicit via metrics/counters.

Reader Non-Blocking Priority

Inbound ME reader path should never be serialized behind one congested client route. Practical implication:

  • prefer non-blocking route attempt in the parser loop;
  • move heavy recovery to async side paths.

Failure Domain Strategy

Endpoint-Level Failure

Failure of one endpoint should trigger endpoint-scoped recovery first:

  • same endpoint reconnect;
  • endpoint replacement within same DC group if applicable.

DC-Level Degradation

If a DC group cannot satisfy floor:

  • keep service via remaining coverage if policy allows;
  • continue asynchronous refill saturation in background.

Whole-Pool Readiness Loss

If no sufficient ME coverage exists:

  • admission gate can hold new accepts (conditional policy);
  • existing sessions should continue when their path remains healthy.

Performance Architecture Notes

Hotpath Discipline

Allowed in hotpath:

  • fixed-size parsing and cheap validation;
  • bounded channel operations;
  • precomputed or low-allocation access patterns.

Avoid in hotpath:

  • repeated expensive decoding;
  • broad locks with awaits inside critical sections;
  • verbose high-frequency logging.

Throughput Stability Over Peak Spikes

Architecture prefers stable throughput and predictable latency over short peak gains that increase churn or long-tail reconnect times.

Evolution and Extension Rules

To evolve this model safely:

  • Add new policy knobs in Control Plane first.
  • Keep Data Plane contracts stable (conn_id, route semantics, close semantics).
  • Validate generation and registry invariants before enabling by default.
  • Introduce new retry/recovery strategies behind explicit config.

Failure and Recovery Notes

  • Single-endpoint DC failure is a normal degraded mode case; policy should prioritize fast reconnect and optional shadow/probing strategies.
  • Idle close by peer should be treated as expected when upstream enforces idle timeout.
  • Reconnect backoff must protect against synchronized churn while still allowing fast first retries.
  • Fallback (ME -> direct DC) is a policy switch, not a transport bug by itself.

Terminology Summary

  • Coverage: enough live writers to satisfy per-DC acceptance policy.
  • Floor: target minimum writer count policy.
  • Churn: frequent writer reconnect/remove cycles.
  • Hotpath: per-packet/per-connection data path where extra waits/allocations are expensive.