11 KiB

Raw Permalink Blame History

Telemt Runtime Model

Scope

This document defines runtime concepts used by the Middle-End (ME) transport pipeline and the orchestration logic around it.

It focuses on:

ME Pool / Reader / Writer / Refill / Registry
Adaptive Floor
Trio-State
Generation Lifecycle

Core Entities

ME Pool

ME Pool is the runtime orchestrator for all Middle-End writers.

Responsibilities:

Holds writer inventory by DC/family/endpoint.
Maintains routing primitives and writer selection policy.
Tracks generation state (active, warm, draining context).
Applies runtime policies (floor mode, refill, reconnect, reinit, fallback behavior).
Exposes readiness gates used by admission logic (for conditional accept/cast behavior).

Non-goals:

It does not own client protocol decoding.
It does not own per-client business policy (quotas/limits).

ME Writer

ME Writer is a long-lived ME RPC tunnel bound to one concrete ME endpoint (ip:port), with:

Outbound command channel (send path).
Associated reader loop (inbound path).
Health/degraded flags.
Contour/state and generation metadata.

A writer is the actual data plane carrier for client sessions once bound.

ME Reader

ME Reader is the inbound parser/dispatcher for one writer:

Reads/decrypts ME RPC frames.
Validates sequence/checksum.
Routes payloads to client-connection channels via Registry.
Emits close/ack/data events and updates telemetry.

Design intent:

Reader must stay non-blocking as much as possible.
Backpressure on a single client route must not stall the whole writer stream.

Refill

Refill is the recovery mechanism that restores writer coverage when capacity drops:

Per-endpoint restore (same endpoint first).
Per-DC restore to satisfy required floor.
Optional outage-mode/shadow behavior for fragile single-endpoint DCs.

Refill works asynchronously and should not block hot routing paths.

Registry

Registry is the routing index between ME and client sessions:

conn_id -> client response channel
conn_id <-> writer_id binding map
writer activity snapshots and idle tracking

Main invariants:

A conn_id routes to at most one active response channel.
Writer loss triggers safe unbind/cleanup and close propagation.
Registry state is the source of truth for active ME-bound session mapping.

Adaptive Floor

What it is

Adaptive Floor is a runtime policy that changes target writer count per DC based on observed activity, instead of always holding static peak floor.

Why it exists

Goals:

Reduce idle writer churn under low traffic.
Keep enough warm capacity to avoid client-visible stalls on burst recovery.
Limit needless reconnect storms on unstable endpoints.

Behavioral model

Under activity: floor converges toward configured static requirement.
Under prolonged idle: floor can shrink to a safe minimum.
Recovery/grace windows prevent aggressive oscillation.

Safety constraints

Never violate minimal survivability floor for a DC group.
Refill must still restore quickly on demand.
Floor adaptation must not force-drop already bound healthy sessions.

Trio-State

Trio-State is writer contouring:

Warm
Active
Draining

State semantics

Warm: connected and validated, not primary for new binds.
Active: preferred for new binds and normal traffic.
Draining: no new regular binds; existing sessions continue until graceful retirement rules apply.

Transition intent

Warm -> Active: when coverage/readiness conditions are satisfied.
Active -> Draining: on generation swap, endpoint replacement, or controlled retirement.
Draining -> removed: after drain TTL/force-close policy (or when naturally empty).

This separation reduces SPOF and keeps cutovers predictable.

Generation Lifecycle

Generation isolates pool epochs during reinit/reconfiguration.

Lifecycle phases

Bootstrap: initial writers are established.
Warmup: next generation writers are created and validated.
Activation: generation promoted to active when coverage gate passes.
Drain: previous generation becomes draining, existing sessions are allowed to finish.
Retire: old generation writers are removed after graceful rules.

Operational guarantees

No partial generation activation without minimum coverage.
Existing healthy client sessions should not be dropped just because a new generation appears.
Draining generation exists to absorb in-flight traffic during swap.

Readiness and admission

Pool readiness is not equivalent to “all endpoints fully saturated”. Typical gating strategy:

Open admission when per-DC minimal alive coverage exists.
Continue background saturation for multi-endpoint DCs.

This keeps startup latency low while preserving eventual full capacity.

Interactions Between Concepts

Generation defines pool epochs.
Trio-State defines per-writer role inside/around those epochs.
Adaptive Floor defines how much capacity should be maintained right now.
Refill is the actuator that closes the gap between desired and current capacity.
Registry keeps per-session routing correctness while all of the above changes over time.

Architectural Approach

Layered Design

The runtime is intentionally split into two planes:

Control Plane: decides desired topology and policy (floor, generation swap, refill, fallback).
Data Plane: executes packet/session transport (reader, writer, routing, acks, close propagation).

Architectural rule:

Control Plane may change writer inventory and policy.
Data Plane must remain stable and low-latency while those changes happen.

Ownership Model

Ownership is centered around explicit state domains:

MePool owns writer lifecycle and policy state.
Registry owns per-connection routing bindings.
Writer task owns outbound ME socket send progression.
Reader task owns inbound ME socket parsing and event dispatch.

This prevents accidental cross-layer mutation and keeps invariants local.

Control Plane Responsibilities

Control Plane is event-driven and policy-driven:

Startup initialization and readiness gates.
Runtime reinit (periodic or config-triggered).
Coverage checks per DC/family/endpoint group.
Floor enforcement (static/adaptive).
Refill scheduling and retry orchestration.
Generation transition (warm -> active, previous active -> draining).

Control Plane must prioritize determinism over short-term aggressiveness.

Data Plane Responsibilities

Data Plane is throughput-first and allocation-sensitive:

Session bind to writer.
Per-frame parsing/validation and dispatch.
Ack and close signal propagation.
Route drop behavior under missing connection or closed channel.
Minimal critical logging in hot path.

Data Plane should avoid waiting on operations that are not strictly required for frame correctness.

Concurrency and Synchronization

Concurrency Principles

Per-writer isolation: each writer has independent send/read task loops.
Per-connection isolation: client channel state is scoped by conn_id.
Asynchronous recovery: refill/reconnect runs outside the packet hot path.

Synchronization Strategy

Shared maps use fine-grained, short-lived locking.
Read-mostly paths avoid broad write-lock windows.
Backpressure decisions are localized at route/channel boundary.

Design target:

A slow consumer should degrade only itself (or its route), not global writer progress.

Cancellation and Shutdown

Writer and reader loops are cancellation-aware:

explicit cancel token / close command support;
safe unbind and cleanup via registry;
deterministic order: stop admission -> drain/close -> release resources.

Consistency Model

Session Consistency

For one conn_id:

exactly one active route target at a time;
close and unbind must be idempotent;
writer loss must not leave dangling bindings.

Generation Consistency

Generational consistency guarantees:

New generation is not promoted before minimum coverage gate.
Previous generation remains available in draining state during handover.
Forced retirement is policy-bound (drain ttl, optional force-close), not immediate.

Policy Consistency

Policy changes (adaptive/static floor, fallback mode, retries) should apply without violating established active-session routing invariants.

Backpressure and Flow Control

Route-Level Backpressure

Route channels are bounded by design. When pressure increases:

short burst absorption is allowed;
prolonged congestion triggers controlled drop semantics;
drop accounting is explicit via metrics/counters.

Reader Non-Blocking Priority

Inbound ME reader path should never be serialized behind one congested client route. Practical implication:

prefer non-blocking route attempt in the parser loop;
move heavy recovery to async side paths.

Failure Domain Strategy

Endpoint-Level Failure

Failure of one endpoint should trigger endpoint-scoped recovery first:

same endpoint reconnect;
endpoint replacement within same DC group if applicable.

DC-Level Degradation

If a DC group cannot satisfy floor:

keep service via remaining coverage if policy allows;
continue asynchronous refill saturation in background.

Whole-Pool Readiness Loss

If no sufficient ME coverage exists:

admission gate can hold new accepts (conditional policy);
existing sessions should continue when their path remains healthy.

Performance Architecture Notes

Hotpath Discipline

Allowed in hotpath:

fixed-size parsing and cheap validation;
bounded channel operations;
precomputed or low-allocation access patterns.

Avoid in hotpath:

repeated expensive decoding;
broad locks with awaits inside critical sections;
verbose high-frequency logging.

Throughput Stability Over Peak Spikes

Architecture prefers stable throughput and predictable latency over short peak gains that increase churn or long-tail reconnect times.

Evolution and Extension Rules

To evolve this model safely:

Add new policy knobs in Control Plane first.
Keep Data Plane contracts stable (conn_id, route semantics, close semantics).
Validate generation and registry invariants before enabling by default.
Introduce new retry/recovery strategies behind explicit config.

Failure and Recovery Notes

Single-endpoint DC failure is a normal degraded mode case; policy should prioritize fast reconnect and optional shadow/probing strategies.
Idle close by peer should be treated as expected when upstream enforces idle timeout.
Reconnect backoff must protect against synchronized churn while still allowing fast first retries.
Fallback (ME -> direct DC) is a policy switch, not a transport bug by itself.

Terminology Summary

Coverage: enough live writers to satisfy per-DC acceptance policy.
Floor: target minimum writer count policy.
Churn: frequent writer reconnect/remove cycles.
Hotpath: per-packet/per-connection data path where extra waits/allocations are expensive.

11 KiB Raw Permalink Blame History