127 KiB
Telemt Relay Hardening — Implementation Plan
Ground Rules
Every workstream follows this mandatory sequence:
- Write the full test suite first. Tests must fail on the current code for the reason being fixed.
- Implement production changes until all new tests pass and no existing test regresses.
- A failing red test is evidence of a real bug or gap. Never relax a test assertion — fix the code.
- All test code goes in dedicated files under
src/proxy/tests/(or the owning module's#[cfg(test)]block via#[path]). No inline#[cfg(test)]inside production code. - No PR lands in a non-compiling state. Every diff must be self-contained and
cargo test-green.
Agreed Decisions
| Topic | Decision |
|---|---|
Item 3 _buffer_pool |
Option B — repurpose the parameter for adaptive startup buffer sizes, not remove it |
| Item 4b in-session adaptation | Decision-gate phase: run experiment, measure, then choose one path |
| Item 1 Level 1 log-normal | Independent of PR-B and PR-C — can land after PR-A only |
| Scope | All items (1, 2, 3, 4a, 4b, 5) in one master plan, separate PRs |
PR Dependency Graph
PR-A (baseline test harness)
├─► PR-C (Item 5: DRS — independent of DI, self-contained)
├─► PR-F (Item 1 Level 1: log-normal — independent, no shared-state changes)
└─► PR-B (Item 2: DI migration — high-risk blast radius)
└─► PR-D (Items 3+4a: adaptive startup)
└─► PR-E (Item 4b: decision gate)
└─► PR-G (Item 1 Level 2: state-aware IPT)
PR-H (docs + release gate)
NEW ORDERING RATIONALE (per audit recommendations):
- PR-C before PR-B: DRS is self-contained, needs only
is_tlsflag (already inHandshakeSuccess) and a newdrs_enabledconfig field. No dependency on the large DI refactor. Delivers anti-censorship value immediately. Reduces risk of a stuck dependency chain if PR-B becomes complicated. - PR-F independent: Log-normal replacement modifies only two
rng.random_range()call sites inmasking.rsandhandshake.rs. Zero dependency on DI or DRS. Can be parallelized with PR-C and PR-B. - PR-B then PR-D: DI must be complete before adaptive startup wiring, as both involve injecting state.
- PR-A first, always: Baseline gates must lock before any code changes.
Parallelization: PR-C and PR-B test-writing can happen in parallel once PR-A is done; production code integration is sequential.
PR-A — Baseline Test Harness (Phase 1)
Goal: Establish regression gates and shared test utilities that all subsequent PRs depend on. No runtime behavior changes.
TDD compatibility note: Phase 1 is a characterization and invariant-lock phase. Its baseline tests are intentionally green on current code and exist to freeze security-critical behavior before refactors. This does not waive red-first TDD for later phases: every behavior-changing PR after Phase 1 must begin with red tests that fail on then-current code.
Security objective for Phase 1: lock anti-probing and anti-fingerprinting behavior before protocol-shape changes. Phase 1 tests must include positive, negative, edge, and adversarial scanner cases with deterministic CI execution and strict fail-closed oracles.
Split into two sub-phases (reduces risk: if test utilities need iteration, baseline tests aren't blocked):
- PR-A.1: Shared test utilities only. Zero behavior assertions. Merge gate: compiles.
- PR-A.2: Baseline invariant tests. All green on current code. Depends on PR-A.1.
PR-A.1: Shared test utilities
New file: src/proxy/tests/test_harness_common.rs
MODULE DECLARATION: Declare once in src/proxy/mod.rs as:
#[cfg(test)]
#[path = "tests/test_harness_common.rs"]
mod test_harness_common;
DO NOT declare via #[path] in relay.rs, handshake.rs, or middle_relay.rs. Including the same file via #[path] in multiple modules duplicates all definitions and causes compilation errors (see F15). Consuming test modules import via use crate::proxy::test_harness_common::*; (or selective imports).
NOTE: Existing 104 test files already define ad-hoc test utilities inline (e.g., ScriptedWriter in relay_atomic_quota_invariant_tests.rs, PendingWriter in masking_security_tests.rs, seeded_rng in masking_lognormal_timing_security_tests.rs, test_config_with_secret_hex in handshake_security_tests.rs). The harness consolidates these for reuse but does not retroactively migrate existing files — that would inflate PR-A's blast radius for zero safety gain.
Contents:
use crate::config::ProxyConfig;
use rand::rngs::StdRng;
use rand::SeedableRng;
use std::io;
use std::pin::Pin;
use std::sync::Arc;
use std::task::{Context, Poll};
use tokio::io::AsyncWrite;
// ── RecordingWriter ─────────────────────────────────────────────────
// In-memory AsyncWrite that records both per-write and per-flush granularity.
//
// `writes`: one entry per poll_write call (records write-call boundaries).
// `flushed`: one entry per poll_flush call (records record/TLS-frame boundaries).
// Each entry is all bytes accumulated since the previous flush.
//
// DRS tests (PR-C) need flush-boundary tracking to verify TLS record framing.
// The dual tracking avoids needing separate writer types for different test needs.
pub struct RecordingWriter {
pub writes: Vec<Vec<u8>>,
pub flushed: Vec<Vec<u8>>,
current_record: Vec<u8>,
}
impl RecordingWriter {
pub fn new() -> Self {
Self {
writes: Vec::new(),
flushed: Vec::new(),
current_record: Vec::new(),
}
}
/// Total bytes written across all writes.
pub fn total_bytes(&self) -> usize {
self.writes.iter().map(|w| w.len()).sum()
}
}
impl AsyncWrite for RecordingWriter {
fn poll_write(
mut self: Pin<&mut Self>,
_cx: &mut Context<'_>,
buf: &[u8],
) -> Poll<io::Result<usize>> {
let me = self.as_mut().get_mut();
me.writes.push(buf.to_vec());
me.current_record.extend_from_slice(buf);
Poll::Ready(Ok(buf.len()))
}
fn poll_flush(mut self: Pin<&mut Self>, _cx: &mut Context<'_>) -> Poll<io::Result<()>> {
let me = self.as_mut().get_mut();
let record = std::mem::take(&mut me.current_record);
if !record.is_empty() {
me.flushed.push(record);
}
Poll::Ready(Ok(()))
}
fn poll_shutdown(self: Pin<&mut Self>, _cx: &mut Context<'_>) -> Poll<io::Result<()>> {
Poll::Ready(Ok(()))
}
}
// ── PendingCountWriter ──────────────────────────────────────────────
// Returns Poll::Pending for the first N poll_write calls, then delegates to inner.
// Also supports separate pending-count control for poll_flush calls.
//
// Needed for DRS tests (PR-C):
// - drs_pending_on_write_does_not_increment_completed_counter
// - drs_pending_on_flush_propagates_pending_without_spurious_wake
//
// Unlike the existing masking_security_tests.rs PendingWriter (which is
// unconditionally Pending forever), this supports counted transitions.
pub struct PendingCountWriter<W> {
pub inner: W,
pub write_pending_remaining: usize,
pub flush_pending_remaining: usize,
}
impl<W> PendingCountWriter<W> {
pub fn new(inner: W, write_pending: usize, flush_pending: usize) -> Self {
Self {
inner,
write_pending_remaining: write_pending,
flush_pending_remaining: flush_pending,
}
}
}
impl<W: AsyncWrite + Unpin> AsyncWrite for PendingCountWriter<W> {
fn poll_write(
mut self: Pin<&mut Self>,
cx: &mut Context<'_>,
buf: &[u8],
) -> Poll<io::Result<usize>> {
let me = self.as_mut().get_mut();
if me.write_pending_remaining > 0 {
me.write_pending_remaining -= 1;
cx.waker().wake_by_ref();
return Poll::Pending;
}
Pin::new(&mut me.inner).poll_write(cx, buf)
}
fn poll_flush(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<io::Result<()>> {
let me = self.as_mut().get_mut();
if me.flush_pending_remaining > 0 {
me.flush_pending_remaining -= 1;
cx.waker().wake_by_ref();
return Poll::Pending;
}
Pin::new(&mut me.inner).poll_flush(cx)
}
fn poll_shutdown(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<io::Result<()>> {
Pin::new(&mut self.inner).poll_shutdown(cx)
}
}
// ── Deterministic seeded RNG ────────────────────────────────────────
// Wraps StdRng::seed_from_u64 for reproducible CI runs.
//
// LIMITATION: Cannot substitute for SecureRandom in production function calls.
// Production code that accepts &SecureRandom requires a project-specific wrapper.
// Tests needing deterministic behavior of production functions that accept
// `impl Rng` (like sample_lognormal_percentile_bounded) can use this directly.
// Tests calling functions that take &SecureRandom must use SecureRandom::new().
pub fn seeded_rng(seed: u64) -> StdRng {
StdRng::seed_from_u64(seed)
}
// ── Config builders ─────────────────────────────────────────────────
// Builds a minimal ProxyConfig with TLS mode enabled.
// Unlike the per-test-file `test_config_with_secret_hex` helpers, this produces
// a config suitable for relay tests that need is_tls=true but don't need
// handshake secret validation.
pub fn tls_only_config() -> Arc<ProxyConfig> {
let mut cfg = ProxyConfig::default();
cfg.general.modes.tls = true;
Arc::new(cfg)
}
// Builds a ProxyConfig with a test user and secret for handshake tests.
// Requires auth_probe, masking, and SNI configuration for full handshake paths.
pub fn handshake_test_config(secret_hex: &str) -> ProxyConfig {
let mut cfg = ProxyConfig::default();
cfg.access.users.clear();
cfg.access
.users
.insert("test-user".to_string(), secret_hex.to_string());
cfg.access.ignore_time_skew = true;
cfg.censorship.mask = true;
cfg.censorship.mask_host = Some("127.0.0.1".to_string());
cfg.censorship.mask_port = 0; // Overridden by caller with actual listener port
cfg
}
DROPPED UTILITIES (vs original plan):
SliceReader: Unnecessary.tokio::io::duplex()channels (used in every existing relay test) andstd::io::Cursor<Vec<u8>>(which implementsAsyncReadvia tokio) already solve this. Adding abytes-crate-dependentSliceReaderintroduces coupling for zero gain.test_stats() -> Arc<Stats>: Trivial one-liner (Arc::new(Stats::new())). Every existing test already constructs this inline. A wrapper adds indirection without value.test_buffer_pool() -> Arc<BufferPool>: Same reasoning —Arc::new(BufferPool::new())is a one-liner already used everywhere.
PR-A.1 Merge gate
cargo check --tests — compiles with no errors. No behavior assertions yet.
Determinism gate for all new Phase 1 tests:
- Seed RNG-dependent tests via
seeded_rng(...)(or explicit fixed seeds). - For timing-sensitive async-delay tests (for example server-hello delay or relay watchdog timing), use paused tokio time and explicit time advancement instead of wall-clock sleeps.
- Avoid shared mutable cross-test coupling except temporary helpers explicitly called out in this plan (
auth_probe_test_lock,relay_idle_pressure_test_scope,desync_dedup_test_lock) until PR-B removes them. - Use explicit per-test IO timeouts (
tokio::time::timeout) on network/fallback paths to prevent deadlocks and scheduler-dependent flakes. - Keep all adversarial corpora deterministic (fixed vectors, fixed seed order). No nondeterministic fuzz in default CI.
PR-A.2: Baseline invariant tests
All tests in this sub-phase must pass on current code — they are regression locks, not red tests. They lock existing behavior before subsequent PRs modify it.
DESIGN PRINCIPLE: Tests must be implementation-agnostic. Test through public/pub(crate) functions, not through direct static access. This ensures PR-B (which moves statics into ProxySharedState) does not break baseline tests.
SCOPE DISCIPLINE: Phase 1 should lock boundary behavior and narrow invariants only. It must not duplicate deep transport choreography, quota accounting, or close-matrix coverage that is already exercised elsewhere unless the baseline adds a new security oracle that later PRs could realistically regress unnoticed.
TEST ISOLATION: All handshake baseline tests must use the existing auth_probe_test_lock() / clear_auth_probe_state_for_testing() pattern until PR-B replaces it. Middle-relay idle tests use relay_idle_pressure_test_scope() / clear_relay_idle_pressure_state_for_testing(), while desync tests use desync_dedup_test_lock() / clear_desync_dedup_for_testing(). This is temporary coupling that PR-B will eliminate.
FAIL-CLOSED ASSERTION POLICY (mandatory for Phase 1):
- Any probe/fallback error path must assert one of: (a) transparent mask-host relay behavior or (b) silent close / generic transport failure.
- Tests must never assert proxy-identifying payloads, banners, or protocol-specific error hints.
- For "no identity leak" cases, assert observable behavior (bytes sent, connection state, error class) rather than brittle log text matching.
New file: src/proxy/tests/relay_baseline_invariant_tests.rs
Declared in src/proxy/relay.rs via:
#[cfg(test)]
#[path = "tests/relay_baseline_invariant_tests.rs"]
mod relay_baseline_invariant_tests;
De-duplication audit: The existing 7 relay test files cover quota boundary attacks, quota overflow, watchdog delta, and adversarial HOL blocking. The baseline tests below cover different invariants not locked by existing tests, verified against the existing test names:
relay_watchdog_delta_security_tests.rstestswatchdog_delta()function exhaustively — overlaps withrelay_baseline_watchdog_delta_handles_wraparound_gracefully. DROP the watchdog delta baseline test (existing tests already lock this behavior fully).relay_adversarial_tests.rs::relay_hol_blocking_prevention_regressionexercises bidirectional transfer but does not assert symmetric byte counting. KEEP the symmetric-counting baseline.- No existing test covers the zero-byte transfer case. KEEP.
- No existing test covers the activity timeout firing path. KEEP.
- Existing end-to-end quota cutoff coverage already exists in
relay_quota_boundary_blackhat_tests.rs. DROP duplicate quota-cutoff baseline here. - Existing half-close chaos coverage already exists in
relay_adversarial_tests.rs::relay_chaos_half_close_crossfire_terminates_without_hang. DROP duplicate half-close baseline here.
// Positive: relay with no data flow for >ACTIVITY_TIMEOUT returns Ok.
// Verifies watchdog fires and select! cancels copy_bidirectional cleanly.
relay_baseline_activity_timeout_fires_after_inactivity
// Positive: relay with immediate close on both sides returns Ok(())
// and both StatsIo byte counters read zero.
relay_baseline_zero_bytes_returns_ok_and_counters_zero
// Positive: transfer N bytes C→S and M bytes S→C simultaneously.
// Assert StatsIo counters match exactly (no double-counting or loss).
relay_baseline_bidirectional_bytes_counted_symmetrically
// Error path: both duplex sides close simultaneously (EOF race).
// relay_bidirectional returns without panic.
relay_baseline_both_sides_close_simultaneously_no_panic
// Error path: server-side writer returns BrokenPipe mid-transfer.
// relay_bidirectional propagates error without panic.
relay_baseline_broken_pipe_midtransfer_returns_error
// Adversarial: single-byte writes for 10000 iterations.
// Assert counters exactly 10000 (no off-by-one in StatsIo accounting).
relay_baseline_many_small_writes_exact_counter
Oracle requirements (mandatory):
relay_baseline_activity_timeout_fires_after_inactivity: use paused tokio time. Assert the relay does not complete beforeACTIVITY_TIMEOUT, then does complete after advancing pastACTIVITY_TIMEOUT + WATCHDOG_INTERVALwith bounded slack. Do not assert a wall-clock range that ignores the watchdog cadence.relay_baseline_zero_bytes_returns_ok_and_counters_zero: assert both directions observe EOF (readreturns0) andstats.get_user_total_octets(user) == 0.relay_baseline_bidirectional_bytes_counted_symmetrically: send fixed payload sizesNandM; assert exact byte equality on both peers and exact counter equality (N + Mtotal accounted where applicable).relay_baseline_both_sides_close_simultaneously_no_panic: assert join result isOk(Ok(()))(not just "did not panic").relay_baseline_broken_pipe_midtransfer_returns_error: assert typed error class (io::ErrorKind::BrokenPipeor mapped proxy error) and no process crash.relay_baseline_many_small_writes_exact_counter: enforce upper runtime bound withtimeout(Duration::from_secs(3), ...)and assert exact transferred/accounted byte count.
New file: src/proxy/tests/handshake_baseline_invariant_tests.rs
Declared in src/proxy/handshake.rs via:
#[cfg(test)]
#[path = "tests/handshake_baseline_invariant_tests.rs"]
mod handshake_baseline_invariant_tests;
De-duplication audit: The existing 13 handshake test files heavily test auth_probe behavior, bit-flip rejection, key zeroization, and timing. The baseline tests below lock specific invariants at the function-call boundary level:
LAYERING RULE: handle_tls_handshake(...) is a handshake classifier/authenticator, not the masking relay itself. Handshake baseline tests must stop at the HandshakeResult boundary. Actual client-visible fallback relay behavior belongs in masking/client baselines, not in direct handshake tests.
TEST ISOLATION: Each test acquires auth_probe_test_lock() / unknown_sni_warn_test_lock() / warned_secrets_test_lock() as needed. Each test calls the corresponding clear_*_for_testing() at the start. All tests use the existing test_config_with_secret_hex-style config construction (via the new handshake_test_config helper or inline).
// Positive: unrecognized handshake bytes classify as `BadClient` rather than
// a success path. This locks the invariant that garbage input is rejected
// without exposing proxy-specific success semantics at the handshake boundary.
handshake_baseline_probe_always_falls_back_to_masking
// Positive: valid TLS ClientHello but wrong secret stays on the non-success
// path, not an authenticated handshake success.
handshake_baseline_invalid_secret_triggers_fallback_not_error_response
// Positive: consecutive failed handshakes from same IP increment
// auth_probe_fail_streak for that IP.
// Tests through the public auth_probe_fail_streak_for_testing() accessor.
handshake_baseline_auth_probe_streak_increments_per_ip
// Positive: after AUTH_PROBE_BACKOFF_START_FAILS consecutive failures,
// the IP is throttled. Tests through auth_probe_is_throttled_for_testing().
// NOTE: AUTH_PROBE_BACKOFF_START_FAILS is a compile-time constant
// (different values for #[cfg(test)] and production). Name reflects this.
handshake_baseline_saturation_fires_at_compile_time_threshold
// Adversarial: attacker sends 100 handshakes with distinct invalid secrets
// from the same IP. Verify auth_probe streak grows monotonically.
handshake_baseline_repeated_probes_streak_monotonic
// Security: after throttle engages, the tracked auth-probe block window lasts
// for the computed backoff duration and then expires.
handshake_baseline_throttled_ip_incurs_backoff_delay
// Adversarial: malformed TLS-like probe frames (truncated record header,
// impossible length fields, random high-entropy payload) never panic and
// never classify as successful handshakes.
handshake_baseline_malformed_probe_frames_fail_closed_to_masking
Oracle requirements (mandatory):
handshake_baseline_probe_always_falls_back_to_masking: assertHandshakeResult::BadClient { .. }(or equivalent non-success fallback classification). Do not require direct observation of downstream mask-host IO at this layer.handshake_baseline_invalid_secret_triggers_fallback_not_error_response: assert non-success handshake classification and no success-path key material/result. Client-visible fallback behavior is covered in masking/client tests.handshake_baseline_auth_probe_streak_increments_per_ip: assert monotonic increment by exact delta+1per failed attempt for one IP, with no mutation for untouched IPs.handshake_baseline_saturation_fires_at_compile_time_threshold: assert transition point occurs exactly atAUTH_PROBE_BACKOFF_START_FAILS(not before) and remains throttled after threshold.handshake_baseline_repeated_probes_streak_monotonic: assert strictly non-decreasing streak over deterministic 100-attempt corpus.handshake_baseline_throttled_ip_incurs_backoff_delay: this Phase 1 baseline locks the internal throttle window semantics, not wire-visible sleep duration. Assert the tracked block window lasts at leastauth_probe_backoff(AUTH_PROBE_BACKOFF_START_FAILS)and expires after that bound. If client-visible delay coverage is desired, add a separate async test withserver_hello_delay_min_ms == server_hello_delay_max_msthrough the client/handshake entrypoint.handshake_baseline_malformed_probe_frames_fail_closed_to_masking: run deterministic malformed corpus; assert no success result, no panic, and bounded completion per case. Do not over-assert downstream masking transport from the handshake-only boundary.
Timing requirement for this file:
- Use paused tokio time only for tests that actually measure async sleep behavior. Tracker-only tests may use synthetic
Instantarithmetic and must avoid wall-clock sleeps.
New file: src/proxy/tests/middle_relay_baseline_invariant_tests.rs
Declared in src/proxy/middle_relay.rs via:
#[cfg(test)]
#[path = "tests/middle_relay_baseline_invariant_tests.rs"]
mod middle_relay_baseline_invariant_tests;
DESIGN: Existing middle-relay suites already exercise idle/desync behavior extensively, but many assertions are tightly coupled to current internals/statics. Phase 1 only adds minimal stable helper-boundary contract locks that must remain stable across PR-B.
TEST ISOLATION: Idle-registry tests acquire relay_idle_pressure_test_scope() and call clear_relay_idle_pressure_state_for_testing() at the start. Desync-dedup tests acquire desync_dedup_test_lock() and call clear_desync_dedup_for_testing() at the start. Do not rely on one lock to serialize the other registry.
// API contract: mark+oldest+clear round-trip through stable helper functions only,
// without direct access to internal registries/statics.
middle_relay_baseline_public_api_idle_roundtrip_contract
// API contract: dedup suppress/allow semantics through stable helper entry,
// without asserting internal map layout.
middle_relay_baseline_public_api_desync_window_contract
Oracle requirements (mandatory):
middle_relay_baseline_public_api_idle_roundtrip_contract: assert firstmark_relay_idle_candidate(conn)returnstrue,oldest_relay_idle_candidate() == Some(conn), after clear it is notSome(conn), and a second mark after clear succeeds.middle_relay_baseline_public_api_desync_window_contract: through the stable helper boundary only, assert first event emits, duplicate within window suppresses, and post-rotation/window-advance emits again.
New file: src/proxy/tests/masking_baseline_invariant_tests.rs
Declared in src/proxy/masking.rs via:
#[cfg(test)]
#[path = "tests/masking_baseline_invariant_tests.rs"]
mod masking_baseline_invariant_tests;
RATIONALE: The masking module is the primary anti-DPI component — it makes the proxy appear to be a legitimate website when probed by censors. PR-F modifies mask_outcome_target_budget (log-normal replacement). Without baseline locks on masking timing behavior, PR-F could subtly regress the timing envelope with no detection.
DETERMINISM NOTE: mask_outcome_target_budget(...) currently samples through an internal RNG, so Phase 1 cannot require a seeded exact output sequence from that function. Baseline tests here must assert stable invariants such as bounds and fail-closed behavior, not exact sample values or distribution shape. Distribution-quality assertions belong in PR-F once a deterministic seam exists or when tests force deterministic config such as floor == ceiling.
The existing 37 masking test files cover specific attack scenarios but don't lock the high-level behavioral contracts that all subsequent PRs must preserve:
// Positive: mask_outcome_target_budget returns a Duration within
// [floor_ms, ceiling_ms] when normalization is enabled.
// This is the core anti-fingerprinting timing envelope.
masking_baseline_timing_normalization_budget_within_bounds
// Positive: handle_bad_client with mask=true connects to the configured
// mask_host and forwards initial_data verbatim. Verifies the proxy
// correctly impersonates a legitimate website by relaying to the real backend.
masking_baseline_fallback_relays_to_mask_host
// Security: mask_outcome_target_budget with timing_normalization_enabled=false
// returns the default masking budget (MASK_TIMEOUT), preserving legacy timing posture.
masking_baseline_no_normalization_returns_default_budget
// Adversarial: mask_host is unreachable (connection refused).
// handle_bad_client must not panic and must fail closed (silent close or
// generic transport error), without proxy-identifying response bytes.
masking_baseline_unreachable_mask_host_silent_failure
// Light fuzz: deterministic malformed initial_data corpus (length extremes,
// random binary, invalid UTF-8) must never panic.
masking_baseline_light_fuzz_initial_data_no_panic
Oracle requirements (mandatory):
masking_baseline_timing_normalization_budget_within_bounds: assert every sampled budget satisfiesfloor <= budget <= ceilingacross a fixed-size repeated sample loop. Do not require a seeded exact sequence from the current implementation.masking_baseline_fallback_relays_to_mask_host: assert exact byte preservation for forwardedinitial_dataand backend response relay to client.masking_baseline_no_normalization_returns_default_budget: assert exact default budget (MASK_TIMEOUT).masking_baseline_unreachable_mask_host_silent_failure: assert no proxy-identifying bytes are written to client and completion remains bounded.masking_baseline_light_fuzz_initial_data_no_panic: fixed malformed corpus only; assert no panic, bounded runtime per case, and no identity leak.
De-duplication note:
- Existing masking suites already cover half-close lifecycle and bounded offline fallback timing (
masking_self_target_loop_security_tests.rs,masking_adversarial_tests.rs,masking_relay_guardrails_security_tests.rs). - Existing masking suites already cover strict byte-cap enforcement and cap overshoot regression (
masking_production_cap_regression_security_tests.rs) plus broader close/failure matrices (masking_connect_failure_close_matrix_security_tests.rs). - Phase 1 baseline masking tests therefore focus on top-level contracts (timing envelope bounds, fallback posture, unreachable-backend fail-closed behavior, light malformed-input robustness), not re-testing transport choreography already covered elsewhere.
PR-A.2 Merge gate
All tests pass on current code:
cargo test -- relay_baseline_
cargo test -- handshake_baseline_
cargo test -- middle_relay_baseline_
cargo test -- masking_baseline_
cargo test -- --test-threads=1
cargo test -- --test-threads=32
cargo test # full suite — no regressions
Notes:
--test-threads=1catches hidden ordering assumptions.--test-threads=32catches shared-state bleed and race-sensitive flakes.- Heavy stress scenarios that are too expensive for default CI must be marked
#[ignore]and run in dedicated security/perf pipelines, never deleted. - Each adversarial baseline test must have an explicit upper runtime bound to keep CI deterministic.
- Any assertion that depends on wall-clock variance must use bounded ranges and paused time where applicable; exact wall-clock equality checks are forbidden.
Phase 1 ASVS L2 alignment focus (test intent mapping):
- V1.2 / V1.4: fail-closed behavior and concurrency isolation under adversarial probe traffic.
- V7.4: cryptographic/protocol error handling does not leak identifying behavior.
- V9.1: communication behavior under malformed input is deterministic, bounded, and non-panicking.
- V13.2: degradation paths (fallback/masking) preserve security posture and do not disclose gateway identity.
Critical Review Issues Found and Addressed in PR-A
| # | Severity | Issue from critique | Resolution |
|---|---|---|---|
| 1 | Critical | test_harness_common.rs has no valid declaration site; triple #[path] causes duplicate symbols |
Declared once in proxy/mod.rs; consuming tests import via use crate::proxy::test_harness_common::* |
| 2 | High | RecordingWriter semantics ambiguous; flush-boundary tracking missing for DRS tests |
Dual tracking: writes (per poll_write) + flushed (per poll_flush boundary with accumulator) |
| 3 | High | SliceReader unnecessarily requires bytes crate |
Dropped. tokio::io::duplex() and std::io::Cursor already solve this |
| 4 | Medium | PendingWriter only controls poll_write; flush pending tests need separate control |
Renamed to PendingCountWriter with separate write_pending_remaining and flush_pending_remaining |
| 5 | Critical | Baseline tests duplicate existing tests; watchdog_delta wraparound test trivially green |
Watchdog delta baseline dropped (7 existing tests in relay_watchdog_delta_security_tests.rs cover it exhaustively). All other baselines audited against 104 existing test files. |
| 6 | High | Handshake baseline tests require complex scaffold not provided by tls_only_config() |
Added handshake_test_config(secret_hex) builder with user, secret, auth settings, and masking config |
| 7 | Medium | test_stats() / test_buffer_pool() are trivial wrappers |
Dropped. Arc::new(Stats::new()) and Arc::new(BufferPool::new()) are one-liners, universally inlined already |
| 8 | High | Middle relay baseline tests lock on global statics; PR-B removes them → guaranteed breakage | Tests call public functions (mark_relay_idle_candidate, clear_relay_idle_candidate) not statics. PR-B changes implementations, not function signatures. |
| 9 | Medium | seeded_rng returns StdRng, can't substitute for SecureRandom |
Documented as explicit limitation in code comment |
| 10 | Medium | No test isolation strategy for auth_probe global state | Each handshake baseline test acquires auth_probe_test_lock() and calls clear_auth_probe_state_for_testing(). Documented as temporary coupling. |
| 11 | Low | "configured threshold" misnomer for compile-time constant | Renamed to handshake_baseline_saturation_fires_at_compile_time_threshold |
| 12 | High | Zero error-path regression locks in baseline suite | Added: relay_baseline_both_sides_close_simultaneously_no_panic, relay_baseline_broken_pipe_midtransfer_returns_error |
| 13 | Medium | relay_baseline_empty_transfer_completes_without_error is vague |
Replaced with: relay_baseline_zero_bytes_returns_ok_and_counters_zero (sharp assertion) |
| 14 | Medium | No masking.rs baseline tests despite PR-F modifying masking | Added masking_baseline_invariant_tests.rs with timing/fallback/cap/adversarial tests |
| NEW-1 | High | PR-A text could be read as violating global TDD "red first" rule | Clarified Phase 1 as characterization-only; red-first remains mandatory for all behavior-changing phases |
| NEW-2 | Medium | "No production code changes" wording conflicts with required #[cfg(test)] module wiring |
Corrected scope statement to "No runtime behavior changes" |
| NEW-3 | High | Fail-closed requirement was implicit, allowing weak "no panic"-only assertions | Added explicit fail-closed assertion policy for anti-probing paths |
| NEW-4 | High | Timing and network-path baselines risk CI flakiness/deadlocks | Added deterministic timeout and paused-time requirements |
| NEW-5 | Medium | Several proposed baselines duplicated existing relay/middle-relay/handshake coverage | Pruned duplicate cases (relay quota cutoff, relay half-close, unknown-SNI warn rate-limit) and reduced middle-relay baseline to API-contract-only tests |
| NEW-6 | High | Relay inactivity oracle ignored WATCHDOG_INTERVAL, making the timeout assertion architecturally wrong |
Rewrote the oracle around paused-time advancement past ACTIVITY_TIMEOUT + WATCHDOG_INTERVAL |
| NEW-7 | High | Handshake baselines conflated HandshakeResult::BadClient with downstream masking relay behavior |
Separated handshake-layer classification assertions from masking/client-layer fallback IO assertions |
| NEW-8 | High | Handshake "backoff delay" wording conflated auth-probe state tracking with wire-visible sleep latency | Re-scoped the baseline to throttle-window semantics and deferred client-visible delay checks to an explicit async entrypoint test |
| NEW-9 | Medium | Masking timing determinism requirement overstated what the current internal-RNG API can guarantee | Limited Phase 1 masking timing assertions to invariant bounds instead of seeded exact sequences |
| NEW-10 | Medium | Middle-relay isolation guidance omitted desync_dedup_test_lock(), leaving desync tests underspecified |
Split idle-registry and desync-dedup isolation requirements by helper/lock |
| NEW-11 | Medium | Masking baseline list still carried redundant cases already covered by dedicated cap and close-matrix suites | Pruned duplicate cap/empty-input/partial-close baseline cases from mandatory Phase 1 scope |
PR-B — Item 2: Dependency Injection for Global Proxy State
Priority: High. Blocks PR-D. (PR-C and PR-F are independent — see D1 below.)
TDD compatibility note: PR-B cannot start with red tests that reference a non-existent ProxySharedState API, because that would fail at compile time rather than exposing the current runtime bug. Split PR-B into:
- PR-B.0 (seam only, green): add
shared_state.rs, defineProxySharedState, and thread an instance parameter through the call chain without changing storage semantics yet. - PR-B.1 (red): add isolation tests against the new seam; they must compile and fail on then-current code because the seam still routes into global state.
- PR-B.2 (green): cut storage over from globals to per-instance state, then remove global reset/lock helpers.
This keeps red-first TDD for the behavior change while allowing the minimum compile-time scaffolding needed to express the tests.
Problem (concrete)
The core blocker set is the 12 handshake and middle-relay statics below. These are logically scoped to one running proxy instance but currently live at process scope, which forces test serialization and prevents two proxy instances in one process from remaining isolated:
| Static | File | Line | Type |
|---|---|---|---|
AUTH_PROBE_STATE |
src/proxy/handshake.rs |
52 | OnceLock<DashMap<IpAddr, AuthProbeState>> |
AUTH_PROBE_SATURATION_STATE |
src/proxy/handshake.rs |
53 | OnceLock<Mutex<Option<AuthProbeSaturationState>>> |
AUTH_PROBE_EVICTION_HASHER |
src/proxy/handshake.rs |
55 | OnceLock<RandomState> |
INVALID_SECRET_WARNED |
src/proxy/handshake.rs |
33 | OnceLock<Mutex<HashSet<(String, String)>>> |
UNKNOWN_SNI_WARN_NEXT_ALLOWED |
src/proxy/handshake.rs |
39 | OnceLock<Mutex<Option<Instant>>> |
DESYNC_DEDUP |
src/proxy/middle_relay.rs |
54 | OnceLock<DashMap<u64, Instant>> |
DESYNC_DEDUP_PREVIOUS |
src/proxy/middle_relay.rs |
55 | OnceLock<DashMap<u64, Instant>> |
DESYNC_HASHER |
src/proxy/middle_relay.rs |
56 | OnceLock<RandomState> |
DESYNC_FULL_CACHE_LAST_EMIT_AT |
src/proxy/middle_relay.rs |
57 | OnceLock<Mutex<Option<Instant>>> |
DESYNC_DEDUP_ROTATION_STATE |
src/proxy/middle_relay.rs |
58 | OnceLock<Mutex<DesyncDedupRotationState>> |
RELAY_IDLE_CANDIDATE_REGISTRY |
src/proxy/middle_relay.rs |
61 | OnceLock<Mutex<RelayIdleCandidateRegistry>> |
RELAY_IDLE_MARK_SEQ |
src/proxy/middle_relay.rs |
62 | AtomicU64 (direct static) |
Explicitly out of core PR-B scope:
USER_PROFILESinadaptive_buffers.rsstays process-global for PR-D cross-session memory. It must not be counted as a per-instance DI blocker for PR-B.LOGGED_UNKNOWN_DCSindirect_relay.rsand the warning-dedupAtomicBoolstatics inclient.rsare ancillary diagnostics caches, not core handshake/relay isolation state. Keep them for a follow-up consistency PR after the handshake and middle-relay cutover lands.
These force a large body of tests to use auth_probe_test_lock(), relay_idle_pressure_test_scope(), and desync_dedup_test_lock() to stay deterministic. The current branch also has tests that read AUTH_PROBE_STATE and DESYNC_DEDUP directly, so the migration scope is larger than helper removal alone.
Step 1: Add seam, then write red tests (must fail on then-current code)
Important sequencing correction: red tests for PR-B must be written after the non-behavioral seam from PR-B.0 exists, otherwise they cannot compile. They still remain red-first for the actual behavior change because the seam initially points to the old globals.
New file: src/proxy/tests/proxy_shared_state_isolation_tests.rs
Declared in src/proxy/mod.rs via a single #[cfg(test)] #[path = "tests/proxy_shared_state_isolation_tests.rs"] mod proxy_shared_state_isolation_tests; declaration. Do NOT declare in both handshake.rs and middle_relay.rs — including the same file via #[path] in two modules duplicates all definitions and causes compilation errors.
TEST SCOPE: These tests cover only the handshake and middle-relay state being migrated in core PR-B. Do not mix in direct_relay.rs unknown-DC logging or client.rs warning-dedup behavior here.
// Fails because AUTH_PROBE_STATE is global — second instance shares first's state.
proxy_shared_state_two_instances_do_not_share_auth_probe_state
// Fails because DESYNC_DEDUP is global.
proxy_shared_state_two_instances_do_not_share_desync_dedup
// Fails because RELAY_IDLE_CANDIDATE_REGISTRY is global.
proxy_shared_state_two_instances_do_not_share_idle_registry
// Fails: resetting state in instance A must not affect instance B.
proxy_shared_state_reset_in_one_instance_does_not_affect_another
// Fails: parallel tests increment the same IP counter in AUTH_PROBE_STATE.
proxy_shared_state_parallel_auth_probe_updates_stay_per_instance
// Fails: desync rotation in instance A must not advance rotation state of instance B.
proxy_shared_state_desync_window_rotation_is_per_instance
// Fails: idle seq counter is global AtomicU64, shared between instances.
proxy_shared_state_idle_mark_seq_is_per_instance
// Adversarial: attacker floods auth probe state in "proxy A" must not exhaust probe
// budget of unrelated "proxy B" sharing the process.
proxy_shared_state_auth_saturation_does_not_bleed_across_instances
DROP from mandatory core PR-B:
proxy_shared_state_poisoned_mutex_in_one_instance_does_not_panic_other. This is too implementation-coupled for the initial red phase and is better expressed as targeted unit tests once per-instance lock recovery helpers exist. The core risk is cross-instance state bleed, not synthetic poisoning choreography.
New file: src/proxy/tests/proxy_shared_state_parallel_execution_tests.rs
// Spawns 50 concurrent auth-probe updates against distinct ProxySharedState instances,
// asserts each instance's counter matches exactly what it received (no cross-talk).
proxy_shared_state_50_concurrent_instances_no_counter_bleed
// Desync dedup: 20 concurrent instances each performing window rotation,
// asserts rotation state is per-instance and not double-rotated.
proxy_shared_state_desync_rotation_concurrent_20_instances
// Idle registry: 10 concurrent mark+evict cycles across isolated instances,
// asserts no cross-eviction.
proxy_shared_state_idle_registry_concurrent_10_instances
Step 2: Implement ProxySharedState
New file: src/proxy/shared_state.rs
MUTEX TYPE: All Mutex fields below are std::sync::Mutex, NOT tokio::sync::Mutex. The current codebase uses std::sync::Mutex for all these statics, and all critical sections are short (insert/get/retain) with no await points inside. Per Architecture.md §5: "Never hold a lock across an await unless atomicity explicitly requires it." Using std::sync::Mutex is correct here because:
- Lock hold times are bounded (microseconds for DashMap/HashSet operations)
- No
.awaitis called while holding any of these locks tokio::sync::Mutexwould add unnecessary overhead for these synchronous operations
use std::sync::Mutex; // NOT tokio::sync::Mutex — see note above
pub struct HandshakeSharedState {
pub auth_probe: DashMap<IpAddr, AuthProbeState>,
pub auth_probe_saturation: Mutex<Option<AuthProbeSaturationState>>,
pub auth_probe_eviction_hasher: RandomState,
pub invalid_secret_warned: Mutex<HashSet<(String, String)>>,
pub unknown_sni_warn_next_allowed: Mutex<Option<Instant>>,
}
pub struct MiddleRelaySharedState {
pub desync_dedup: DashMap<u64, Instant>,
pub desync_dedup_previous: DashMap<u64, Instant>,
pub desync_hasher: RandomState,
pub desync_full_cache_last_emit_at: Mutex<Option<Instant>>,
pub desync_dedup_rotation_state: Mutex<DesyncDedupRotationState>,
pub relay_idle_registry: Mutex<RelayIdleCandidateRegistry>,
// Monotonic counter; kept as AtomicU64 inside the struct, not a global.
pub relay_idle_mark_seq: AtomicU64,
}
pub struct ProxySharedState {
pub handshake: HandshakeSharedState,
pub middle_relay: MiddleRelaySharedState,
}
impl ProxySharedState {
pub fn new() -> Arc<Self> { ... }
}
Declare pub mod shared_state; in src/proxy/mod.rs between lines 61–69.
ProxySharedState is architecturally: state that (a) must survive across multiple concurrent connections, (b) is logically scoped to one running proxy instance, not the whole process. Aligns with Architecture.md §3.1 Singleton rule: "pass shared state explicitly via Arc<T>."
Scope correction: ProxySharedState in core PR-B should contain only handshake and middle-relay shared state. Do not add adaptive_buffers::USER_PROFILES here.
Step 3: Thread Arc<ProxySharedState> through the call chain
src/proxy/handshake.rs
Current signature of handle_tls_handshake (line 690):
pub async fn handle_tls_handshake<R, W>(
handshake: &[u8],
reader: R,
mut writer: W,
peer: SocketAddr,
config: &ProxyConfig,
replay_checker: &ReplayChecker,
rng: &SecureRandom,
tls_cache: Option<Arc<TlsFrontCache>>,
) -> HandshakeResult<...>
New signature — add one parameter at the end:
shared: &ProxySharedState, // ← add as last parameter
Current signature of handle_mtproto_handshake (line 854): same pattern — add shared: &ProxySharedState as last parameter.
All internal calls to auth_probe_state_map(), auth_probe_saturation_state(), warn_invalid_secret_once(), unknown_sni_warn_state_lock() are replaced with direct field access on &shared.handshake. The five accessor functions (auth_probe_state_map, auth_probe_saturation_state, unknown_sni_warn_state_lock) are deleted.
src/proxy/middle_relay.rs
Current signature of handle_via_middle_proxy (line 695):
pub(crate) async fn handle_via_middle_proxy<R, W>(
mut crypto_reader: CryptoReader<R>,
crypto_writer: CryptoWriter<W>,
success: HandshakeSuccess,
me_pool: Arc<MePool>,
stats: Arc<Stats>,
config: Arc<ProxyConfig>,
buffer_pool: Arc<BufferPool>,
local_addr: SocketAddr,
rng: Arc<SecureRandom>,
mut route_rx: watch::Receiver<RouteCutoverState>,
route_snapshot: RouteCutoverState,
session_id: u64,
) -> Result<()>
New signature — add shared: Arc<ProxySharedState> after session_id: u64. All RELAY_IDLE_CANDIDATE_REGISTRY, DESYNC_DEDUP, etc. accesses replaced with shared.middle_relay.*. relay_idle_candidate_registry() accessor deleted.
src/proxy/client.rs
The call site of handle_tls_handshake (line ~553) and handle_via_middle_proxy (line ~1289) must pass the Arc<ProxySharedState> that is constructed once in the main startup path and passed down. Locate the top-level handle_client_stream function (line 317) and add shared: Arc<ProxySharedState> to its parameters, then thread through.
handle_authenticated_static(...) also needs shared: Arc<ProxySharedState> because it dispatches to the middle-relay path after the handshake.
Construction site correction: the current connection task spawn lives in src/maestro/listeners.rs, not src/maestro/mod.rs or src/startup.rs. Construct one Arc<ProxySharedState> alongside the other long-lived listener resources and clone it into each handle_client_stream(...) task. Do not create a fresh shared-state instance per accepted connection.
Scope correction: handle_via_direct(...) stays unchanged in core PR-B unless the ancillary direct_relay.rs unknown-DC dedup migration is explicitly pulled into scope.
Step 4: Remove test helpers and migrate test files
After production code passes all new tests, remove the global reset/lock helpers for the migrated handshake and middle-relay state. Do not blindly delete every test accessor. Prefer converting narrow query helpers into instance-scoped helpers when they preserve test decoupling from internal map layout.
Delete or replace these handshake/middle-relay globals:
auth_probe_test_lock()unknown_sni_warn_test_lock()warned_secrets_test_lock()relay_idle_pressure_test_scope()desync_dedup_test_lock()- global reset helpers that only exist to wipe process-wide state between tests
Prefer converting, not deleting outright:
auth_probe_fail_streak_for_testing(...)auth_probe_is_throttled_for_testing(...)- similar narrow read-only helpers that can become
..._for_testing(shared, ...)
Blast-radius correction: this migration affects more than the helper users listed in the original draft. The current branch has many handshake and middle-relay tests that read AUTH_PROBE_STATE or DESYNC_DEDUP directly. Those tests must be migrated off raw statics before the statics are removed.
Ancillary direct_relay.rs helpers such as unknown_dc_test_lock() remain out of scope unless that follow-up consistency migration is explicitly included.
No global Mutex<()> test locks remain for the migrated handshake/middle-relay state after this PR. Do not overstate this as a repository-wide guarantee while ancillary globals still exist elsewhere.
Merge gate
cargo check --tests
cargo test -- proxy_shared_state_
cargo test -- handshake_
cargo test -- middle_relay_
cargo test -- client_
cargo test -- --test-threads=1
cargo test -- --test-threads=32
All must pass. No existing test may fail. The thread-count runs are mandatory here because PR-B's entire purpose is eliminating hidden cross-test and cross-instance state bleed.
PR-C — Item 5: Dynamic Record Sizing (DRS) for the TLS Relay Path
Priority: High (anti-censorship, TLS-mode only).
TDD note for PR-C: Red tests in this phase must fail because DRS behavior is absent, not because APIs are temporarily broken. Keep baseline relay API compatibility where practical so failures remain behavioral, not compile-surface churn.
SCOPE LIMITATION: This PR covers the direct relay path only (direct_relay.rs → relay_bidirectional). The middle relay path (middle_relay.rs → explicit ME→client flush loop) is not addressed. Since middle relay is the default when ME URLs are configured, this represents a significant coverage gap for those deployments. Future follow-up (PR-C.1): add DRS shaping to the middle relay's explicit flush loop. This is architecturally simpler (natural flush tick points exist) and should be prioritized immediately after PR-C.
Problem (concrete)
src/proxy/relay.rs line 563:
result = copy_bidirectional_with_sizes(
&mut client, // client = StatsIo<CombinedStream<CryptoReader, CryptoWriter>>
&mut server,
c2s_buf_size.max(1),
s2c_buf_size.max(1),
) => Some(result),
client is a StatsIo wrapping a CombinedStream. The write half of client (the path sending data to the real client) has no TLS record framing control. TLS record sizes observed by DPI are determined by tokio's internal copy buffer size — a single constant that produces a recognizable signature absent from real browser TLS sessions.
The previous draft had three bugs (now fixed here):
- Used 1450 byte payload → creates 1471-byte framed records → TCP splits into
[1460, 11]signature. Correct value: 1369 bytes. - Incremented
records_completedon everypoll_writecall, not only when a record boundary is crossed. Fix: trackbytes_in_current_record; only increment when a flush completes. - Returned
Poll::Pendingwithwake_by_ref()after a flush completed, causing an immediate spurious reschedule. Fix: useready!macro andcontinuein a loop — no yield between a completed flush and the next write.
Step 1: Write red tests (must fail on current code)
New file: src/proxy/tests/drs_writer_unit_tests.rs
Declared in src/proxy/relay.rs.
// Positive: bytes emitted to inner writer arrive in records of exactly
// target_record_size(0..=39) = 1369 before flush, then 4096, then 16384.
drs_first_40_records_are_1369_bytes_payload_each
drs_records_41_to_60_are_4096_bytes_payload_each
drs_records_above_60_are_16384_bytes_payload_each
// Boundary/edge: a write of 1 byte completes correctly and counts toward
// bytes_in_current_record without incrementing records_completed prematurely.
drs_single_byte_write_does_not_prematurely_complete_record
// Edge: write of exactly 1369 bytes fills one record; next poll_write triggers flush.
drs_write_equal_to_record_size_requires_second_poll_for_flush
// Edge: TWO sequential poll_write calls, each crossing one record boundary,
// produce exactly two separate flushes (can't flush twice in single poll).
drs_two_sequential_writes_cross_boundary_each_produces_one_flush
// Edge: empty slice write returns Ok(0) immediately without touching inner.
drs_empty_write_returns_zero_does_not_touch_inner
// Edge: poll_shutdown delegates to inner and does not flush records.
drs_shutdown_delegates_to_inner
// Adversarial: inner writer returns Pending on first 5 poll_write calls.
// DrsWriter must not loop-busy-poll and must not increment records_completed.
drs_pending_on_write_does_not_increment_completed_counter
// Adversarial: inner flush returns Pending. DrsWriter must propagate Pending
// without calling wake_by_ref (verified by checking waker was not called).
drs_pending_on_flush_propagates_pending_without_spurious_wake
// Adversarial: 10001 consecutive 1-byte writes; verify records_completed
// count matches expected record boundaries, no off-by-one.
drs_10001_single_byte_writes_records_count_exact
// Stress: bounded concurrent DrsWriter instances each writing deterministic
// payloads; assert total flushed bytes equals total written bytes.
// Large-scale variants belong in ignored perf/security jobs, not default CI.
drs_concurrent_instances_no_data_loss
// Security/anti-DPI: collect sizes of all records produced by writing 100 KB
// through DrsWriter; assert no record with size > 1369 appears in first 40.
// This is the packet-shape non-regression test.
drs_first_records_do_not_exceed_mss_safe_payload_size
// Security: non-TLS passthrough path produces no DrsWriter wrapping;
// assert that when is_tls=false the relay produces no record-size shaping.
drs_passthrough_when_not_tls_no_record_shaping
// Overflow hardening: records_completed saturates at final phase and never
// re-enters phase 1 after saturation.
drs_records_completed_counter_does_not_wrap
// Integration: StatsIo byte counters match actual bytes received by inner writer
// when DrsWriter limits write sizes (no data loss or double-counting).
drs_statsio_byte_count_matches_actual_written
// Integration: copy loop handles partial writes at record boundaries without
// data loss or duplication.
drs_copy_loop_partial_write_retry
CI policy for this file:
- Keep default-suite tests deterministic and bounded in runtime and memory.
- Any high-cardinality stress profile (for example 1000 writers x 1 MB) must be marked ignored and run only in dedicated perf/security pipelines.
New file: src/proxy/tests/drs_integration_tests.rs
Declared in src/proxy/relay.rs.
// Integration: relay_bidirectional with DRS enabled (is_tls=true) produces
// records ≤ 1369 bytes in payload size for the first 40 records to the client.
drs_relay_bidirectional_tls_first_records_bounded
// Integration: relay_bidirectional with is_tls=false produces no DrsWriter
// overhead (records sized by c2s_buf_size only).
drs_relay_bidirectional_non_tls_no_drs_overhead
// Integration: relay completes normally with DRS enabled; final byte count
// matches input byte count (no loss or duplication).
drs_relay_bidirectional_tls_no_data_loss_end_to_end
// Integration: verify FakeTlsWriter.poll_flush produces a TLS record boundary,
// not a no-op. Otherwise DRS shaping provides no anti-DPI value.
drs_flush_is_meaningful_for_faketls
Step 2: Implement DrsWriter
New file: src/proxy/drs_writer.rs
Declare pub(crate) mod drs_writer; in src/proxy/mod.rs.
pub(crate) struct DrsWriter<W> {
inner: W,
bytes_in_current_record: usize,
// Capped at DRS_PHASE_FINAL (60) to prevent overflow on long-lived connections.
// On 32-bit platforms, an uncapped usize would wrap after ~4 billion records,
// restarting the DRS ramp — a detectable signature.
records_completed: usize,
}
const DRS_PHASE_1_END: usize = 40;
const DRS_PHASE_2_END: usize = 60;
const DRS_PHASE_FINAL: usize = DRS_PHASE_2_END;
// Safe payload for one MSS with TCP-options headroom.
// FakeTLS overhead in THIS proxy: 5 bytes (TLS record header only).
// NOTE: Unlike real TLS 1.3, FakeTlsWriter does NOT add a content-type byte
// or AEAD tag. Real TLS 1.3 overhead would be 22 bytes (5 + 1 + 16).
// We size for the FakeTLS overhead: record on wire = 1369 + 5 = 1374 bytes.
// MSS = 1460 (MTU 1500 - 40 IP+TCP); with TCP timestamps (~12 bytes)
// effective MSS ≈ 1448, leaving 74 bytes margin for path MTU variance (PPPoE, VPN).
// The value 1369 is intentionally conservative to accommodate future FakeTLS
// upgrades that may add AEAD or padding overhead.
const DRS_MSS_SAFE_PAYLOAD: usize = 1_369;
const DRS_PHASE_2_PAYLOAD: usize = 4_096;
// NOTE: FakeTlsWriter uses MAX_TLS_CIPHERTEXT_SIZE = 16_640 as its max payload.
// DRS caps at 16_384 (RFC 8446 TLS 1.3 plaintext limit). This means DRS still
// shapes records in steady-state by limiting to 16_384 instead of 16_640.
// This is intentional: real TLS 1.3 servers cap at 16_384 plaintext bytes per
// record, so DRS mimics that limit even though FakeTLS allows larger records.
const DRS_FULL_RECORD_PAYLOAD: usize = 16_384;
impl<W> DrsWriter<W> {
pub(crate) fn new(inner: W) -> Self {
Self { inner, bytes_in_current_record: 0, records_completed: 0 }
}
fn target_record_size(&self) -> usize {
match self.records_completed {
0..DRS_PHASE_1_END => DRS_MSS_SAFE_PAYLOAD,
DRS_PHASE_1_END..DRS_PHASE_2_END => DRS_PHASE_2_PAYLOAD,
_ => DRS_FULL_RECORD_PAYLOAD,
}
}
}
impl<W: AsyncWrite + Unpin> AsyncWrite for DrsWriter<W> {
fn poll_write(mut self: Pin<&mut Self>, cx: &mut Context<'_>, buf: &[u8]) -> Poll<io::Result<usize>> {
if buf.is_empty() { return Poll::Ready(Ok(0)); }
loop {
let target = self.target_record_size();
let remaining = target.saturating_sub(self.bytes_in_current_record);
if remaining == 0 {
// Record boundary reached — flush before starting the next record.
ready!(Pin::new(&mut self.inner).poll_flush(cx))?;
// Cap at DRS_PHASE_FINAL to prevent usize overflow on long-lived connections.
self.records_completed = self.records_completed.saturating_add(1).min(DRS_PHASE_FINAL + 1);
self.bytes_in_current_record = 0;
continue;
}
let limit = buf.len().min(remaining);
let n = ready!(Pin::new(&mut self.inner).poll_write(cx, &buf[..limit]))?;
self.bytes_in_current_record += n;
return Poll::Ready(Ok(n));
}
}
fn poll_flush(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<io::Result<()>> {
Pin::new(&mut self.inner).poll_flush(cx)
}
fn poll_shutdown(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<io::Result<()>> {
Pin::new(&mut self.inner).poll_shutdown(cx)
}
}
State integrity requirement: bytes_in_current_record must be incremented by the number of bytes actually accepted by inner writer (n), not requested length. This preserves correctness under partial writes.
Pending behavior requirement: if inner poll_write or poll_flush returns Pending, propagate Pending without manual wake_by_ref calls in DRS, relying on inner writer wake semantics.
Step 3: Wire into relay path with compatibility
src/proxy/relay.rs — relay_bidirectional currently (line 456):
pub async fn relay_bidirectional<CR, CW, SR, SW>(
client_reader: CR,
client_writer: CW,
...
_buffer_pool: Arc<BufferPool>, // unchanged at this stage
) -> Result<()>
Compatibility correction: Do not force a signature break on relay_bidirectional(...) for all existing tests/callers. Prefer one of:
- add
relay_bidirectional_with_opts(...)and keeprelay_bidirectional(...)as a passthrough wrapper with defaults; or - introduce a small options struct with a defaulted constructor and keep a compatibility wrapper.
This prevents unrelated relay and masking suites from becoming compile-red due to API churn and keeps PR-C failures focused on DRS behavior.
Inside the function body, where CombinedStream::new(client_reader, client_writer) constructs the client combined stream (line ~481), wrap the write half conditionally with a MaybeDrs enum:
ARCHITECTURE NOTE — write-side placement: DrsWriter wraps the raw client_writer before it enters CombinedStream, which is then wrapped by StatsIo. The resulting call chain on S→C writes is:
copy_bidirectional_with_sizes
→ StatsIo.poll_write (counts bytes, quota accounting)
→ CombinedStream.poll_write
→ MaybeDrs.poll_write
→ DrsWriter.poll_write
→ CryptoWriter.poll_write (AES-CTR encryption, may buffer internally)
→ FakeTlsWriter.poll_write (wraps into TLS record with 5-byte header)
→ TCP socket
This is correct because:
StatsIosees the actual bytes being written (DrsWriter doesn't change byte count, only limits write sizes and triggers flushes).StatsIo.poll_writecounts the return value of CombinedStream.poll_write, which equals DrsWriter's return value — the actual bytes accepted.- CryptoWriter buffering interaction: CryptoWriter.poll_write encrypts and MAY buffer internally (PendingCiphertext) if FakeTlsWriter returns Pending. Crucially, CryptoWriter always returns Ok(to_accept) even when buffering — it never returns Pending unless its internal buffer is full. This means DrsWriter's
bytes_in_current_recordtracking is accurate; CryptoWriter accepts the full limited amount. - DRS flush drains the CryptoWriter→FakeTLS→socket chain:
DrsWriter.poll_flush→CryptoWriter.poll_flush(drains pending ciphertext to FakeTlsWriter) →FakeTlsWriter.poll_flush(drains pending TLS record data to socket) →socket.poll_flush. This is what enforces TLS record boundaries on the wire. Without the flush, CryptoWriter could batch multiple DRS "records" into one FakeTLS record, defeating the purpose. copy_bidirectional_with_sizesalso callspoll_flushon its own schedule; double-flush is safe (idempotent on all three layers) but adds minor syscall overhead.copy_bidirectional_with_sizes's internal S→C buffer will be partially consumed per poll_write (DrsWriter may accept fewer bytes than offered). This is the intended mechanism — the copy loop retries with the remaining buffer.
IMPORTANT: Add a red test drs_statsio_byte_count_matches_actual_written to verify that StatsIo byte counters exactly match the total bytes the inner socket received. Without this, a bug where DrsWriter eats or duplicates bytes would go undetected.
enum MaybeDrs<W> {
Passthrough(W),
Shaping(DrsWriter<W>),
}
impl<W: AsyncWrite + Unpin> AsyncWrite for MaybeDrs<W> {
fn poll_write(mut self: Pin<&mut Self>, cx: &mut Context<'_>, buf: &[u8]) -> Poll<io::Result<usize>> {
match self.get_mut() {
MaybeDrs::Passthrough(w) => Pin::new(w).poll_write(cx, buf),
MaybeDrs::Shaping(w) => Pin::new(w).poll_write(cx, buf),
}
}
fn poll_flush(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<io::Result<()>> {
match self.get_mut() {
MaybeDrs::Passthrough(w) => Pin::new(w).poll_flush(cx),
MaybeDrs::Shaping(w) => Pin::new(w).poll_flush(cx),
}
}
fn poll_shutdown(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<io::Result<()>> {
match self.get_mut() {
MaybeDrs::Passthrough(w) => Pin::new(w).poll_shutdown(cx),
MaybeDrs::Shaping(w) => Pin::new(w).poll_shutdown(cx),
}
}
}
let writer = if opts.is_tls && opts.drs_enabled {
MaybeDrs::Shaping(DrsWriter::new(client_writer))
} else {
MaybeDrs::Passthrough(client_writer)
};
let client = StatsIo::new(CombinedStream::new(client_reader, writer), ...);
PERFORMANCE NOTE: The MaybeDrs::Passthrough variant adds a single enum match dispatch per poll_write/poll_flush/poll_shutdown call (~3-5 cycles on modern CPUs with branch prediction, negligible for TLS overhead). This is acceptable for correctness. Do not attempt zero-overhead abstractions with generic specialization here; the dispatch overhead is unmeasurable relative to the underlying TLS crypto and I/O.
src/proxy/direct_relay.rs — direct path call site:
Pass DRS options only from direct relay dispatch (is_tls = success.is_tls, drs_enabled = config.general.drs_enabled && success.is_tls).
Scope guard: leave middle-relay call choreography untouched in this PR; this is a direct-path-only phase.
Step 5: Add drs_enabled config flag
src/config/types.rs — inside GeneralConfig struct (existing struct, find existing direct_relay_copy_buf_* fields around line 507):
// Controls Dynamic Record Sizing on the direct TLS relay path.
// Safe to disable for debugging; default true when tls mode is active.
#[serde(default = "default_true")]
pub drs_enabled: bool,
IMPORTANT — serde compatibility: New config fields in this PR must have #[serde(default = "...")] annotations. Without these, existing config files that lack the fields will fail to deserialize, breaking upgrades. For PR-C this applies to drs_enabled.
Cross-phase note:
ipt_enabledandipt_levelbelong to later IPT phases; keep them out of PR-C to limit blast radius.
Add helpers as needed:
fn default_true() -> bool { true }
fn default_false() -> bool { false }
fn default_ipt_level() -> u8 { 1 }
If default_true() already exists in defaults, reuse it instead of adding duplicates.
Default: true. Validation: no range constraint needed (boolean). In relay_bidirectional, pass drs_enabled: config.general.drs_enabled && is_tls (gate on both flags at call site).
DO NOT pass Arc<ProxyConfig> into relay_bidirectional — this would introduce control-plane (config) reads into the data-plane hot loop. Instead, the call site in direct_relay.rs computes let drs_enabled = config.general.drs_enabled && success.is_tls and passes it as a bool concrete parameter.
Merge gate
cargo check --tests
cargo test -- drs_
cargo test -- relay_
cargo test -- direct_relay_
cargo test -- masking_relay_guardrails_
cargo test -- --test-threads=1
cargo test -- --test-threads=32
All tests above must pass. Any expensive stress case added in PR-C must be ignored by default and executed in dedicated perf/security pipelines.
PR-D — Items 3 + 4a: Adaptive Startup Buffer Sizing
Priority: Medium. Depends on PR-C.
PREREQUISITE: Remove #![allow(dead_code)] from src/proxy/adaptive_buffers.rs at the start of this PR. The attribute was intentional when the module had zero call sites, but PR-D adds real call sites. Keeping the attribute suppresses legitimate dead-code warnings for any functions that remain unused after wiring.
Problem (concrete)
Most adaptive buffer hardening primitives are already present in src/proxy/adaptive_buffers.rs (key length guards, stale removal via remove_if, TTL eviction, saturating duration math, caps). The remaining production gap is wiring: seed_tier_for_user, record_user_tier, and direct_copy_buffers_for_tier are still not used by direct relay runtime paths.
relay_bidirectional still accepts _buffer_pool only for compatibility. The effective startup sizing is still static (config.general.direct_relay_copy_buf_*) until direct relay applies seeded tier sizing at call time.
USER_PROFILES (adaptive_buffers.rs line 233) — OnceLock<DashMap<String, UserAdaptiveProfile>> — is the only remaining global after PR-B. It is acceptable here because it functions as a process-wide LRU cache (cross-session user history), not as test-contaminating per-connection state.
Step 1: Write red tests for remaining gaps (must fail on current code)
Do not duplicate existing coverage: The repository already contains extensive adaptive buffer tests (adaptive_buffers_security_tests.rs, adaptive_buffers_record_race_security_tests.rs) that validate cache bounds, key guards, TOCTOU stale removal, and concurrency behavior. PR-D red tests should focus only on missing runtime integration and throughput mapping behavior.
New file: src/proxy/tests/adaptive_startup_integration_tests.rs
Declared in src/proxy/direct_relay.rs or src/proxy/adaptive_buffers.rs (single declaration site only).
// RED: direct relay currently ignores seeded tier and always uses static config.
// Assert selected copy buffer sizes follow direct_copy_buffers_for_tier(seed_tier_for_user(user), ...).
adaptive_startup_direct_relay_uses_seeded_tier_buffers
// RED: no production post-session persistence currently upgrades next session.
// After first relay with high throughput, next seed should reflect recorded upgrade.
adaptive_startup_post_session_recording_upgrades_next_session
// RED: short sessions (<1s) must never promote tier even under bursty bytes.
adaptive_startup_short_sessions_do_not_promote
// RED: upgrade path must be monotonic per user within TTL (no downgrade on lower follow-up).
adaptive_startup_recording_remains_monotonic_within_ttl
Existing adaptive security tests already cover empty keys, oversized keys, fuzz keys, and cache cardinality attacks. Do not reintroduce duplicates in PR-D.
Step 2: Keep current hardening, remove outdated dead-code suppression
Current branch already has the core hardening this step originally proposed:
MAX_USER_PROFILES_ENTRIESandMAX_USER_KEY_BYTES- stale purge with
remove_ifinseed_tier_for_user saturating_duration_since-safe TTL math- TTL-based
retaineviction inrecord_user_tier
Required action in PR-D:
- remove
#![allow(dead_code)]fromsrc/proxy/adaptive_buffers.rsonce direct relay wiring lands, so dead paths are visible again.
No behavioral rewrite of existing seed_tier_for_user / record_user_tier is required unless new red tests expose regressions.
Step 3: Add explicit throughput mapping API
src/proxy/adaptive_buffers.rs — new public function:
// Computes the peak tier achieved during a session from total byte counts and
// session duration. Uses only throughput because demand-pressure metrics are
// unavailable at session end (copy_bidirectional drains everything).
// Maps average throughput over a session to an adaptive tier. Only peak direction
// (max of c2s or s2c) is considered to avoid double-counting bidir traffic.
// Note: This uses total-session average, not instantaneous peak. Bursty traffic
// (30s burst @ 100 Mbps, 9.5min idle) will compute as the average over all 10 min,
// potentially underestimating required buffers. Consider measuring peak-window
// throughput from watchdog snapshots (10s intervals) in future refinements.
pub fn average_throughput_to_tier(c2s_bytes: u64, s2c_bytes: u64, duration_secs: f64) -> AdaptiveTier {
if duration_secs < 1.0 { return AdaptiveTier::Base; }
let avg_bps = (c2s_bytes.max(s2c_bytes) as f64 * 8.0) / duration_secs;
if avg_bps >= THROUGHPUT_UP_BPS as f64 { AdaptiveTier::Tier1 }
else { AdaptiveTier::Base }
}
Naming constraint:
- Use
average_throughput_to_tierconsistently. Avoid introducing boththroughput_to_tierandaverage_throughput_to_tieraliases in production code.
Step 4: Wire into direct_relay.rs
src/proxy/direct_relay.rs — inside handle_via_direct, before the call to relay_bidirectional (currently line ~280):
// Seed startup buffer sizes from cross-session user history.
let initial_tier = adaptive_buffers::seed_tier_for_user(user);
let (c2s_buf, s2c_buf) = adaptive_buffers::direct_copy_buffers_for_tier(
initial_tier,
config.general.direct_relay_copy_buf_c2s_bytes,
config.general.direct_relay_copy_buf_s2c_bytes,
);
let relay_epoch = std::time::Instant::now();
Replace the existing config.general.direct_relay_copy_buf_c2s_bytes / s2c_bytes arguments in the relay_bidirectional call with c2s_buf / s2c_buf.
After relay_bidirectional returns (whatever the result), record the tier:
let duration_secs = relay_epoch.elapsed().as_secs_f64();
let final_c2s = /* session c2s total bytes */;
let final_s2c = /* session s2c total bytes */;
let peak_tier = adaptive_buffers::average_throughput_to_tier(final_c2s, final_s2c, duration_secs);
adaptive_buffers::record_user_tier(user, peak_tier);
Implementation seam note:
relay_bidirectionalcurrently encapsulates counters internally. To avoid broad API churn, prefer returning a small relay outcome struct that includes finalc2s_bytesands2c_bytestotals while preserving existing error semantics.- Keep this seam local to direct relay integration; do not expose
SharedCountersinternals broadly.
_buffer_pool remains in the relay_bidirectional signature (Option B: repurposed pathway). Its role is now documented: "parameter reserved for future pool-backed buffer allocation; startup sizing is performed by the caller via adaptive_buffers::direct_copy_buffers_for_tier." The underscore prefix is removed (buffer_pool) and it is still passed as Arc::clone(&buffer_pool) — no functional change.
Merge gate
cargo check --tests
cargo test -- adaptive_buffers_
cargo test -- adaptive_startup_
cargo test -- direct_relay_
cargo test -- --test-threads=1
cargo test -- --test-threads=32
PR-E — Item 4b: In-Session Adaptive Architecture Decision Gate
Priority: Blocks PR-G. Depends on PR-D.
Execution model correction: PR-E is a decision-gate phase, so it must distinguish between:
- deterministic correctness/integration tests (required for CI and merge), and
- performance experiments (informational, ignored by default, run on dedicated hardware).
Do not use throughput/latency benchmark thresholds as hard CI merge gates in this phase.
Problem (concrete)
SessionAdaptiveController::observe (adaptive_buffers.rs line 121) is never called. Three structural blockers prevent in-session adaptation on the direct relay path:
copy_bidirectional_with_sizesis opaque — no hook to observe buffering pressure mid-loop.StatsIowraps only the client side — no server-side write pressure signal.- The watchdog tick is 10 seconds — too coarse for the 250 ms EMA window
observe()expects.
The decision gate must produce measured evidence, not architectural guesses.
Current state note: SessionAdaptiveController and RelaySignalSample already exist in adaptive_buffers.rs, but there is no production wiring that feeds relay runtime signals into observe(...).
Step 1: Required deterministic decision tests (CI required)
New file: src/proxy/tests/adaptive_insession_decision_gate_tests.rs
Declared once (single declaration site).
These tests must be deterministic and runnable on shared CI:
// Confirms direct relay path has no fine-grained signal hook while copy_bidirectional_with_sizes
// remains opaque; this preserves the architectural constraint as an explicit test.
adaptive_decision_gate_direct_path_lacks_tick_hook
// Confirms middle relay path exposes configurable flush timing boundary via
// me_d2c_flush_batch_max_delay_us and can produce periodic signal ticks.
adaptive_decision_gate_middle_relay_has_tick_boundary
// Drives SessionAdaptiveController with deterministic synthetic signal stream and
// verifies promotion/demotion transitions remain stable under fixed tick cadence.
adaptive_decision_gate_controller_transitions_deterministic
// Confirms proposed signal extraction API (or shim) carries enough fields to support
// observe() without leaking internal relay-only types.
adaptive_decision_gate_signal_contract_is_sufficient
Step 2: Optional feasibility experiments (ignored by default)
New file: src/proxy/tests/adaptive_insession_option_a_experiment_tests.rs
Declared in src/proxy/relay.rs.
CI STABILITY WARNING: These tests measure performance overhead, not correctness. They WILL be flaky on shared CI runners with variable CPU scheduling and memory pressure. Mark all tests in this file with #[ignore] by default. Run only in isolated performance environments (dedicated runner, pinned cores, no concurrent load). CI gate should skip these; they are for manual decision-making only.
These tests benchmark overhead, not correctness. Keep #[ignore] and never use as merge blockers:
// Measures latency penalty of adding a per-session 1-second ticker task alongside
// copy_bidirectional_with_sizes using tokio::select!. Records p50/p95/p99 latency
// delta over 1000 relay sessions each transferring 10 MB.
// ACCEPTANCE CRITERION: p99 latency increase < 2 ms; p50 < 0.5 ms.
adaptive_option_a_ticker_overhead_under_acceptance_threshold
// Measures overhead of adding AtomicU64 s2c_pending_write_count to StatsIo
// and incrementing it in poll_write when Poll::Pending. Records throughput
// delta over 10_000 relay calls.
// ACCEPTANCE CRITERION: throughput regression < 1%.
adaptive_option_a_statsio_pending_counter_overhead_under_1pct
// Measures overhead of wrapping the server-side write half in a second StatsIo
// (for server-side pressure signal). Records throughput delta.
// ACCEPTANCE CRITERION: throughput regression < 2%.
adaptive_option_a_server_side_statsio_overhead_under_2pct
Step 3: Option B boundary validation experiment (ignored by default)
New file: src/proxy/tests/adaptive_insession_option_b_experiment_tests.rs
Declared in src/proxy/middle_relay.rs.
// Verifies that middle_relay's explicit ME→client flush loop already provides
// a natural tick boundary at max_delay_us intervals (currently 1000 µs default).
// Records observed tick interval distribution over 500 relay sessions.
// ACCEPTANCE CRITERION: median observed tick ≤ 2× configured max_delay_us.
adaptive_option_b_middle_relay_flush_loop_provides_tick_boundary
// Verifies SessionAdaptiveController::observe can be driven by ME flush ticks.
// Pumps 2000 synthetic RelaySignalSample values through observe() at 1 ms intervals.
// ACCEPTANCE CRITERION: Tier1 promotion fires at expected tick count consistent
// with TIER1_HOLD_TICKS = 8.
adaptive_option_b_observe_driven_by_flush_ticks_promotes_correctly
Step 4: Decision artifact
Placement correction: function renaming and direct relay throughput-to-tier wiring are PR-D tasks, not PR-E tasks. PR-E must not duplicate those implementation steps.
After running both experiment suites, record the measured values in docs/ADAPTIVE_INSESSION_DECISION.md with the format:
## Measured Results
| Metric | Option A measured | Threshold | Pass/Fail |
|---|---|---|---|
| Ticker task p99 latency delta (ms) | X | < 2 ms | ? |
| StatsIo pending counter throughput delta | X | < 1% | ? |
| Server-side StatsIo throughput delta | X | < 2% | ? |
| Metric | Option B measured | Threshold | Pass/Fail |
|---|---|---|---|
| Flush tick median vs configured delay | X | ≤ 2× | ? |
| Tier1 promotion tick accuracy | X | exact | ? |
## Decision: [Option A / Option B]
Rationale: ...
If Option A passes all thresholds → schedule PR-G-A (relay loop instrumentation).
If Option B passes all thresholds → schedule PR-G-B (middle relay SessionAdaptiveController wiring).
If neither passes → escalate and re-design.
Decision rule refinement:
- Deterministic CI tests from Step 1 must pass before any option can be selected.
- Performance thresholds from experiments are advisory evidence and must include environment metadata (CPU model, core pinning, load conditions) in the decision doc.
Merge gate
cargo check --tests
cargo test -- adaptive_insession_decision_gate_
cargo test -- middle_relay_
cargo test -- --test-threads=1
cargo test -- --test-threads=32
Optional experiment runs (not merge blockers):
cargo test -- adaptive_option_a_ -- --ignored
cargo test -- adaptive_option_b_ -- --ignored
PR-F — Item 1 Level 1: Log-Normal Single-Delay Replacement
Priority: Medium. No dependency on PR-B (DI) or PR-C (DRS) — this PR modifies only the RNG call in masking.rs and handshake.rs, touching zero global statics or shared state. Can be developed and merged independently after PR-A (baseline tests). The original "Depends on PR-B + PR-C being stable" was an artificial ordering constraint with no code justification.
Problem (concrete)
mask_outcome_target_budget (masking.rs line 252, rng calls at lines 261–265) draws from uniform distribution:
let delay_ms = rng.random_range(floor..=ceiling);
maybe_apply_server_hello_delay (handshake.rs line 586):
let delay_ms = rand::rng().random_range(min..=max);
Both produce uniform i.i.d. samples. For a single sample this does not matter for classification — you cannot build a histogram from one value. However, replacing uniform with log-normal:
- More accurately models observed real-world TCP RTT distributions (multiplicative central-limit theorem).
- Provides a documented, principled rationale against future attempts to "optimize" the distribution.
Current branch status:
mask_outcome_target_budget(...)already usessample_lognormal_percentile_bounded(...)for theceiling > floor > 0path.maybe_apply_server_hello_delay(...)already routes through the same helper.- Extensive masking log-normal tests already exist in
src/proxy/tests/masking_lognormal_timing_security_tests.rs.
PR-F is therefore an incremental hardening + coverage completion phase, not a greenfield implementation.
Cargo.toml change
No new dependencies required.
Implementation note correction: current code uses a Box-Muller transform built from rng.random_range(...) to derive a standard normal sample, which is valid and avoids extra dependency surface. Do not force migration to StandardNormal unless there is a demonstrated correctness or performance defect.
CRITICAL: avoid adding rand_distr because of rand_core compatibility risk with the existing rand version.
Step 1: Write red tests only for missing coverage (must fail on current code)
Do not duplicate existing masking log-normal suite. Extend src/proxy/tests/masking_lognormal_timing_security_tests.rs only where gaps remain.
// Missing gap candidate: helper behavior under extremely narrow range around 1 ms
// remains stable without boundary clamp spikes.
masking_lognormal_ultra_narrow_range_stability
// Missing gap candidate: floor=0 path remains intentionally uniform and does not
// regress to log-normal semantics.
masking_lognormal_floor_zero_path_regression_guard
Add handshake-side coverage explicitly (new file if needed): src/proxy/tests/handshake_lognormal_delay_security_tests.rs.
Rationale: there is no dedicated handshake_lognormal_ suite yet, and current coverage is mostly indirect through server-hello-delay behavior tests.
// Deterministic bound check via fixed min==max and bounded timer advancement.
handshake_lognormal_fixed_delay_respected
// Inverted config safety: max<min remains safe and bounded.
handshake_lognormal_inversion_resilience
// Randomized-range safety: repeated rejected handshakes remain within configured
// server-hello delay envelope and do not panic.
handshake_lognormal_delay_within_configured_bounds
Step 2: Keep current masking implementation, harden where needed
Retain and verify the existing mask_outcome_target_budget branching (only path 3 uses the log-normal helper):
Reference shape:
let delay_ms = if ceiling == floor {
ceiling
} else {
rng.random_range(floor..=ceiling)
};
New (parameterizing log-normal so that the median equals sqrt(floor * ceiling) — the geometric mean):
NOTE: The existing mask_outcome_target_budget has THREE code paths, not just the ceiling > floor branch:
floor == 0 && ceiling == 0→ returns 0 (unchanged)floor == 0 && ceiling != 0→ usesrng.random_range(0..=ceiling)(uniform)ceiling > floor(withfloor > 0) → usesrng.random_range(floor..=ceiling)(uniform → replace with log-normal)- Fall-through (
ceiling <= floor) → returnsfloor(unchanged)
Only path 3 is replaced. Path 2 (floor=0) must remain uniform because log-normal cannot meaningfully model a distribution anchored at zero — the floor.max(1) guard in sample_lognormal_percentile_bounded changes the distribution center to sqrt(ceiling), which is far from the original uniform median of ceiling/2. Changing this would alter observable timing behavior for deployments using floor_ms=0.
// Path 3 replacement only — inside the `if ceiling > floor` block:
let delay_ms = if ceiling == floor {
ceiling
} else {
sample_lognormal_percentile_bounded(floor, ceiling, &mut rng)
};
Current helper in masking.rs already exists and is pub(crate) for handshake reuse.
If red tests expose issues, patch the existing helper rather than replacing it wholesale:
use rand::Rng;
// Current implementation uses Box-Muller from uniform draws.
// Samples a log-normal distribution parameterized so that the median maps to
// the geometric mean of [floor, ceiling], then clamps the result to that range.
//
// Implementation uses Box-Muller-derived N(0,1) from uniform draws.
// Log-normal = exp(mu + sigma * N(0,1)).
//
// For LogNormal(mu, sigma): median = exp(mu).
// mu = (ln(floor) + ln(ceiling)) / 2 → median = sqrt(floor * ceiling).
// sigma = ln(ceiling/floor) / 4.65 → ensures ~99% of samples fall in [floor, ceiling].
// 4.65 ≈ 2 × 2.326 (z-score for 99th percentile of standard normal).
//
// IMPORTANT: When floor == 0, log-normal parameterization is undefined (ln(0) = -∞).
// We use floor_f = max(floor, 1) for parameter computation but clamp the final
// result to the original [floor, ceiling] range. For floor=0 this produces a
// distribution centered around sqrt(ceiling) — which may differ significantly from
// the original uniform [0, ceiling]. If the caller needs uniform behavior for
// floor=0, it should handle that case before calling this function.
pub(crate) fn sample_lognormal_percentile_bounded(floor: u64, ceiling: u64, rng: &mut impl Rng) -> u64 { ... }
Safety requirement for this helper:
- misconfigured
floor > ceilingmust remain fail-closed and bounded. floor == 0path behavior must remain explicit and documented.- NaN/Inf fallback must remain deterministic and bounded.
Step 3: Implement in handshake.rs
Replace in maybe_apply_server_hello_delay (line 586):
let delay_ms = if max == min {
max
} else {
// Replaced: sample_lognormal_percentile_bounded produces a right-skewed distribution
// with median at geometric mean, matching empirical TLS ServerHello delay profiles.
masking::sample_lognormal_percentile_bounded(min, max, &mut rand::rng())
};
sample_lognormal_percentile_bounded must be made pub(crate) in masking.rs to allow the handshake call.
Status note: this helper is already pub(crate) on the current branch; keep visibility stable.
Status note: this handshake call-site migration is already present on the current branch; PR-F should verify and lock it with dedicated tests.
Merge gate
cargo check --tests
cargo test -- masking_lognormal_timing_security_
cargo test -- server_hello_delay_
cargo test -- masking_ab_envelope_blur_integration_security # regression gate
cargo test -- masking_timing_normalization_security # regression gate
cargo test -- --test-threads=1
cargo test -- --test-threads=32
PR-G — Item 1 Level 2: State-Aware Inter-Packet Timing (Burst/Idle Markov)
Priority: Medium. Depends on PR-E decision gate. Separate PR, design depends on PR-E outcome.
Problem (concrete)
No inter-packet timing (IPT) mechanism exists on the MTProto relay path (confirmed: no IptController anywhere in the codebase). Real HTTPS sessions exhibit two-state autocorrelation: Burst (1–5 ms IPG, 0.95 self-transition) and Idle (2–10 seconds IPG, heavy-tail, 0.99 self-transition). ML classifiers detect the absence of this structure directly from the time-series, regardless of marginal distribution shape.
ARCHITECTURAL BLOCKER (PR-G for direct relay): IPT requires injecting delays between write/flush cycles, which tokio::io::copy_bidirectional_with_sizes does not support. Adding IPT on the direct relay path requires replacing copy_bidirectional_with_sizes with a custom poll loop that calls ipt_controller.next_delay_us() and tokio::time::sleep() between write events. This is substantial work (equivalent to ~300-line custom relay loop). Decision: IPT for direct relay is deferred to a decision gate (PR-E); if approved, PR-G will require a dedicated custom loop. Middle relay (ME→client) has an explicit flush loop (middle_relay.rs line 1200+) where IPT can be added more easily.
CRITICAL DESIGN FIX — DATA-AVAILABILITY AWARENESS: The original IptController is a purely stochastic model with no awareness of whether data is actually waiting to be sent. The Idle state injects 2–30 second delays unconditionally, even when Telegram has queued data for the client. This would cause Telegram client timeouts and connection drops during active sessions.
Required fix: IptController must be signal-driven, not purely probabilistic:
- Burst delays (0.5–10 ms) are applied only when data is actively flowing (relay has data in buffers). This adds realistic inter-packet jitter without stalling delivery.
- Idle state is entered when the relay observes genuine idle (no data received from Telegram for a configurable threshold, e.g. 500ms). During genuine idle, DPI already sees no packets — consistent with browser idle. No artificial delay injection is needed.
- Synthetic keep-alive timing (optional, Level 3 enhancement): during genuine idle periods, inject small padding records at browser-like intervals to maintain the illusion of an active HTTPS session. This requires FakeTLS padding support.
- The
next_delay_us()API must accept ahas_pending_data: boolsignal from the caller. Whenhas_pending_data == true, the controller stays in Burst regardless of the Markov transition. Whenhas_pending_data == falsefor the idle threshold, the controller transitions to Idle but does NOT inject delays — it simply stops the flush loop until new data arrives.
This means:
pub(crate) fn next_delay_us(&mut self, rng: &mut impl Rng, has_pending_data: bool) -> u64 {
if has_pending_data {
// Data waiting: always use Burst timing, regardless of Markov state.
// Markov still transitions (for statistics/logging) but delay is Burst.
self.maybe_transition(rng);
let d = self.burst_dist.sample(rng).max(0.0) as u64;
return d.saturating_mul(1_000).clamp(500, 10_000);
}
// No data pending: return 0 (caller should wait for data, not sleep).
// The caller's recv() timeout on the data channel provides natural idle timing.
0
}
This PR is conditional on PR-E. The exact implementation path (A or B) is determined by the PR-E decision artifact. The test specifications below apply to whichever path is chosen.
Step 1: Write red tests (must fail on current code)
New file: src/proxy/tests/relay_ipt_markov_unit_tests.rs
// IptController starts in Burst state.
ipt_controller_initial_state_is_burst
// Deterministic transition oracle: with an injected decision stream that forces
// "switch" on first call, state toggles Burst -> Idle exactly once.
ipt_controller_forced_transition_toggle_oracle
// In Burst state, next_delay_us(rng, has_pending_data=true) returns value
// consistent with LogNormal(mu=1.0, sigma=0.5) * 1000, clamped to [500, 10_000] µs.
ipt_controller_burst_delay_within_burst_bounds
// Internal Markov state: even though next_delay_us returns 0 when !has_pending_data,
// the Markov chain still transitions. Verify idle_dist sampling works correctly
// when called directly (for future Level 3 keep-alive timing).
// Pareto heavy-tail: minimum = 2_000_000 µs, P(>10s) ≈ 9%.
ipt_controller_idle_dist_sampling_correct
// Deterministic Markov behavior: with an injected decision stream of
// stay/stay/switch, verify exact state sequence without probabilistic thresholds.
ipt_controller_markov_sequence_deterministic_oracle
// Compile-time trait check can be kept if needed, but no CI memory-growth or
// wall-clock budget assertions in merge gates.
ipt_controller_trait_bounds_compile_check
// DATA-AWARENESS: next_delay_us(rng, has_pending_data=true) always returns
// Burst-range delay, even if Markov state is Idle. Verifies that active data
// transfer is never stalled by Idle-phase delays.
ipt_controller_pending_data_forces_burst_delay
// DATA-AWARENESS: next_delay_us(rng, has_pending_data=false) returns 0,
// signaling the caller to wait for data arrival (no artificial sleep).
ipt_controller_no_pending_data_returns_zero
// Adversarial: IptController with f64 overflow — burst_dist.sample() returning
// very large values must not overflow on saturating_mul(1_000). Verify clamp
// catches extreme samples.
ipt_controller_burst_sample_overflow_safe
// Adversarial: idle_dist.sample() returning f64::INFINITY or f64::NAN
// (edge case of Pareto distribution). Cast to u64 must not panic; clamp
// handles gracefully.
ipt_controller_idle_sample_extreme_f64_safe
New file: src/proxy/tests/relay_ipt_integration_tests.rs
// Relay path with IPT enabled: 200 calls alternating has_pending_data true/false.
// Verify that true calls always return Burst-range delays and false calls
// always return 0.
relay_ipt_data_availability_signal_respected
// Adversarial: active prober sends 100 handshakes with invalid keys.
// IPT must not affect the fallback-to-masking behavior or reveal proxy identity
// through timing structure (timing envelope in fallback path is unchanged).
relay_ipt_invalid_handshake_fallback_timing_unchanged
// Adversarial: censor injects 10_000 back-to-back packets at 0-delay
// (has_pending_data=true for all). Verify relay does not stall excessively
// (total added IPT delay < 10% of transfer time for a 1 MB payload at 10 Mbps).
relay_ipt_overhead_under_high_rate_attack_within_budget
// Config kill-switch: ipt_enabled = false → no delay injected.
relay_ipt_disabled_by_config_no_delay_added
Optional (non-merge-gate) performance experiments:
relay_ipt_burst_delays_exhibit_positive_autocorrelation
relay_ipt_500_concurrent_throughput_within_5pct_baseline
Step 2: Implement IptController
New file: src/proxy/ipt_controller.rs
Declare pub(crate) mod ipt_controller; in src/proxy/mod.rs.
use rand::Rng;
pub(crate) enum IptState { Burst, Idle }
// Log-normal parameters for Burst-state inter-packet delay.
// mu=1.0, sigma=0.5 → median ≈ exp(1.0) ≈ 2.7 ms.
const BURST_MU: f64 = 1.0;
const BURST_SIGMA: f64 = 0.5;
const BURST_DELAY_MIN_US: u64 = 500;
const BURST_DELAY_MAX_US: u64 = 10_000;
// Pareto parameters for Idle-state delay (retained for future Level 3 keep-alive).
// scale=2_000_000 µs (2s minimum), shape=1.5 → heavy tail.
const IDLE_PARETO_SCALE: f64 = 2_000_000.0;
const IDLE_PARETO_SHAPE: f64 = 1.5;
pub(crate) struct IptController {
state: IptState,
// Pre-computed Burst/Idle transition probabilities.
burst_stay_prob: f64, // 0.95
idle_stay_prob: f64, // 0.99
}
impl IptController {
pub(crate) fn new() -> Self {
Self {
state: IptState::Burst,
burst_stay_prob: 0.95,
idle_stay_prob: 0.99,
}
}
fn maybe_transition(&mut self, rng: &mut impl Rng) {
// random_bool(p) returns true with probability p, using u64 threshold
// internally for full precision. Simpler than manual u32 threshold.
let stay = match self.state {
IptState::Burst => rng.random_bool(self.burst_stay_prob),
IptState::Idle => rng.random_bool(self.idle_stay_prob),
};
if !stay {
self.state = match self.state {
IptState::Burst => IptState::Idle,
IptState::Idle => IptState::Burst,
};
}
}
// Burst delay via log-normal: exp(mu + sigma * N(0,1)).
// Use dependency-free Box-Muller (same project pattern as masking helper)
// to avoid any additional RNG-distribution dependency churn.
fn sample_burst_delay_us(&self, rng: &mut impl Rng) -> u64 {
let u1 = rng.next_f64().max(f64::MIN_POSITIVE);
let u2 = rng.next_f64();
let normal = (-2.0 * u1.ln()).sqrt() * (2.0 * std::f64::consts::PI * u2).cos();
let raw = (BURST_MU + BURST_SIGMA * normal).exp();
let us = if raw.is_finite() {
(raw as u64).saturating_mul(1_000)
} else {
// exp(1.0) ≈ 2718 → 2_718_000 µs as fallback (won't happen in practice)
2_718_000
};
us.clamp(BURST_DELAY_MIN_US, BURST_DELAY_MAX_US)
}
// Idle delay via Pareto CDF inversion: scale / U^(1/shape).
// Retained for future Level 3 synthetic keep-alive timing.
// NOTE: Currently dead code — next_delay_us returns 0 when !has_pending_data.
#[allow(dead_code)]
fn sample_idle_delay_us(&self, rng: &mut impl Rng) -> u64 {
let u: f64 = rng.random_range(f64::EPSILON..1.0);
let raw = IDLE_PARETO_SCALE / u.powf(1.0 / IDLE_PARETO_SHAPE);
if raw.is_finite() {
(raw as u64).clamp(2_000_000, 30_000_000)
} else {
2_000_000
}
}
// Returns inter-packet delay in microseconds.
// `has_pending_data`: true when the relay has queued data awaiting flush.
// When true, always returns a Burst-range delay — active data must never be
// stalled by Idle-phase pauses (which would cause Telegram client timeouts).
// When false, returns 0 — the caller should block on its data channel recv(),
// which provides natural idle timing matching genuine browser think-time.
pub(crate) fn next_delay_us(&mut self, rng: &mut impl Rng, has_pending_data: bool) -> u64 {
self.maybe_transition(rng);
if has_pending_data {
self.sample_burst_delay_us(rng)
} else {
0
}
}
}
CHANGES vs previous draft:
- No new distribution dependency — uses Box-Muller for log-normal and manual CDF inversion for Pareto.
random_bool(p)for Markov transitions — replaces manual u32 threshold computation. Cleaner, equivalent precision.idle_distexplicitly marked#[allow(dead_code)]—next_delay_usreturns 0 when!has_pending_data, so idle sampling is never reached in production. Retained for future Level 3 keep-alive.- No stored distribution objects — parameters are constants, sampling is inline. Avoids the
expect()calls that would be denied byclippy::expect_used.
Step 3: Integrate into relay path
Conditional on PR-E outcome:
- Option B path (recommended if PR-E selects B): wire
IptControllerinto the ME→client flush loop inmiddle_relay.rs. Each flush cycle callsipt_controller.next_delay_us(&mut rng, has_pending_data)wherehas_pending_data = !frame_buf.is_empty(), thentokio::time::sleep(Duration::from_micros(delay))before the next flush. Whendelay == 0, the loop blocks onme_rx.recv()naturally. - Option A path: replace
copy_bidirectional_with_sizeswith a custom poll loop that callsipt_controller.next_delay_us(rng, has_pending_data)between write completions, checking the read buffer for pending data.
Config flag in src/config/types.rs:
#[serde(default = "default_false")]
pub ipt_enabled: bool, // default: false (opt-in)
#[serde(default = "default_ipt_level")]
pub ipt_level: u8, // 1 = single-delay only, 2 = Markov; default 1
Merge gate
cargo check --tests
cargo test -- relay_ipt_markov_unit_
cargo test -- relay_ipt_integration_
cargo test -- --test-threads=1
cargo test -- --test-threads=32
PR-H — Consolidated Hardening, ASVS L2 Audit, and Documentation
Depends on: All prior PRs.
ASVS L2 Verification Checklist for Changed Areas
| ASVS Control | Area | Verification |
|---|---|---|
| V5.1.1 Input validation | record_user_tier user key length |
MAX_USER_KEY_BYTES = 512 guard in place |
| V5.1.3 Output encoding | DRS framing | No user-controlled field affects record size calculation |
| V5.1.1 Input validation | IptController.next_delay_us |
has_pending_data signal is a bool from trusted internal code; no external input reaches IptController directly |
| V8.1.1 Memory safety | DrsWriter, IptController |
No unsafe blocks; all bounds enforced by Rust type system; saturating_mul prevents overflow in IptController burst sampling |
| V8.3.1 Sensitive data in memory | ProxySharedState |
auth key material remains in HandshakeSuccess on stack; not copied into shared state |
| V11.1.3 TLS config | DRS | TLS path enabled only when is_tls=true; non-TLS path unmodified |
| V11.1.4 Cipher strength | n/a | No cryptographic changes in this plan |
| V2.1.5 Brute force | Auth probe | Probe state in ProxySharedState.handshake.auth_probe; per-IP saturation preserved |
| V6.2.2 Algorithm strength | Log-normal RNG | Box-Muller-based bounded sampler with finite checks and deterministic fallback/clamp; no panic path |
| V14.2.1 Configuration hardening | serde defaults | All new config fields have #[serde(default)] for backward-compatible deserialization |
| V1.4.1 Concurrency | ProxySharedState mutex type |
Uses std::sync::Mutex; locks never held across await points; lock ordering documented |
Full test run command sequence
# Run all proxy tests
cargo test -p telemt -- proxy::
# Run targeted gate for each PR area
cargo test -- relay_baseline_
cargo test -- handshake_baseline_
cargo test -- middle_relay_baseline_
cargo test -- masking_baseline_
cargo test -- proxy_shared_state_
cargo test -- drs_
cargo test -- adaptive_startup_
cargo test -- adaptive_option_
cargo test -- masking_lognormal_
cargo test -- handshake_lognormal_
cargo test -- ipt_
# Full regression (must show zero failures)
cargo test
Documentation changes
docs/CONFIG_PARAMS.en.md — add entries for each new GeneralConfig field:
| Field | Default | Description |
|---|---|---|
drs_enabled |
true |
Enable Dynamic Record Sizing on TLS direct relay path. Disable for debugging. |
ipt_enabled |
false |
Enable state-aware inter-packet timing on relay path. Opt-in; requires testing in your network environment. |
ipt_level |
1 |
IPT level: 1 = log-normal single-delay only, 2 = Burst/Idle Markov chain. |
ROADMAP.md — mark completed items from this plan.
Architectural Decisions & Key Findings from Audit
D1: PR Ordering — Swap PR-C and PR-B
Audit finding: PR-C (DRS) is self-contained; PR-B (DI) has a 2000+ line blast radius.
Decision: YES, swap PR-C → PR-B in execution order. Rationale:
- PR-C dependencies: Only
is_tls(field inHandshakeSuccess, already exists) + newdrs_enabledconfig flag. Zero dependency on DI migration. - PR-C value: Delivers immediate anti-censorship benefit for direct relay TLS path.
- PR-B risk mitigation: If DI refactor hits unforeseen complexity, DRS remains deliverable independently.
- Execution parallelization: PR-C and PR-B can have their test suites written in parallel (PR-A → PR-C tests + PR-B tests in parallel) → PR-C production code → PR-B production code (sequential due to shared entry points).
Updated graph:
PR-A (baseline gates)
├─→ PR-C (DRS, independent)
├─→ PR-F (log-normal, independent)
└─→ PR-B (DI migration)
└─→ PR-D (adaptive startup)
└─→ PR-E (decision gate)
D2: PR-B Phasing (Single Atomic vs Shim+Removal)
Audit suggestion: Two-phase with compatibility shim to reduce blast radius.
Decision: NOT phased — single atomic PR-B. Rationale:
- A shim (global
ProxySharedState::default()instance) would live only through one release cycle, complicating both phases. - With high test coverage from PR-A, full replacement is safer than partial compatibility.
- Parallel test execution gates (
cargo test -- --test-threads=32) will catch test interference before merge. - Mitigation: Sequence the changes:
ProxySharedStatecreation first → all accessors updated → test helpers removed. Review in logical chunks per file.
D3: PR-C Scope — Direct Relay Only, Middle Relay Gap
Audit finding: Middle relay (the default mode) is not covered; this is a CVE-level coverage gap.
Decision: KNOWN LIMITATION with ELEVATED follow-up priority. Document explicitly:
- PR-C covers direct relay path only (direct_relay.rs → relay_bidirectional).
- Middle relay path (middle_relay.rs → explicit ME→client loop) requires separate PR-C.1 (follow-up).
- Middle relay is the DEFAULT for deployments with configured ME URLs, which is the typical production setup. Direct relay is used when
use_middle_proxy=falseor no ME pool is available. - PR-C.1 must be elevated to the same priority as PR-C (High, anti-censorship). It should begin development immediately after PR-C merges, not be treated as a casual follow-up. Middle relay has natural flush-tick points that make DRS integration architecturally simpler than direct relay.
- Action: Add a
docs/DRS_DEPLOYMENT_NOTES.mdwith guidance documenting which relay modes have DRS coverage and which are pending PR-C.1.
D4: DRS MaybeDrs Enum Overhead
Audit finding: MaybeDrs<W> enum adds a branch dispatch per poll.
Decision: ACCEPTABLE. The dispatch overhead (~3-5 cycles with branch prediction) is negligible vs TLS crypto, I/O latency, and network RTT. Do NOT attempt zero-overhead abstractions (e.g., generic specialization); the complexity is not worth the unmeasurable gain. Document the assumption in code comments.
D5: Log-Normal Distribution Parameterization — CRITICAL FIX
Audit finding: Fixed sigma=0.5 creates an 18% clamp spike at ceiling, detectable by DPI.
Decision: FIXED (already applied above). New parameterization:
- mu = (ln(floor) + ln(ceiling)) / 2 → median = sqrt(floor * ceiling) (geometric mean, NOT arithmetic mean)
- sigma = ln(ceiling/floor) / 4.65 → ensures ~99% of samples fall within [floor, ceiling]
- Result: NO spike; distribution smoothly bounded.
- Function renamed to
sample_lognormal_percentile_boundedto reflect guarantee. - Mathematical note: The median of this distribution is the geometric mean sqrt(floor * ceiling), which differs from the arithmetic mean (floor+ceiling)/2 for asymmetric ranges. Tests must assert against the geometric mean, not the arithmetic mean.
D6: IptController pre-construction to avoid unwrap()
Audit finding: LogNormal::new().unwrap() in next_delay_us won't compile under deny(clippy::unwrap_used).
Decision: FIXED (redesigned above). IptController now uses inline Box-Muller sampling for the normal component and manual Pareto CDF inversion. No distribution objects are stored; no unwrap()/expect() calls needed. The random_bool(p) API replaces manual u32 threshold computation for Markov transitions.
D7: Adaptive Buffers — Stale Entry Leak
Audit finding: seed_tier_for_user returns Base for expired profiles but doesn't remove them; cache fills with stale entries.
Decision: FIXED (already applied above). Two changes:
seed_tier_for_usernow usesDashMap::remove_ifwith a TTL predicate (atomic, avoids TOCTOU race where concurrentrecord_user_tierinserts a fresh profile betweendrop(entry)andremove(user)).record_user_tieruses TTL-basedDashMap::retain()for overflow eviction (single O(n) pass, removes stale entries when cache exceedsMAX_USER_PROFILES_ENTRIES). This replaces the originally proposed "oldest N by LRU" strategy which would have required O(n log n) sorting + double-shard-locking.
D8: throughput_to_tier Metric — Average Not Peak
Audit finding: Function computes average throughput over entire session; bursty traffic is underestimated.
Decision: RENAMED + DOCUMENTED. New name: average_throughput_to_tier makes the limitation explicit. Comment documents: "Uses total-session average, not instantaneous peak. Consider peak-window measurement from watchdog snapshots as a future refinement." Users deploying in bursty-traffic environments should consider manual tier pinning via config until this limitation is addressed.
Answers to Audit Open Questions
Q1: PR-B phasing — single atomic PR-B or split Phase 1 (shim) + Phase 2 (production threading)?
A1: Proceed with single atomic PR-B. The shim approach delays clean state and complicates review. High test coverage from PR-A mitigates risk. Use sequential sub-phases within the PR (ProxySharedState creation → accessors → test helpers) and require parallel test execution gates before merge.
Q2: Middle relay DRS — should PR-C also address ME→client path, or is that a follow-up?
A2: Follow-up (PR-C.1) at the same priority level as PR-C (High). Direct relay DRS is the initial deliverable; it's self-contained. However, middle relay is the default production mode for deployments with configured ME URLs, making PR-C.1 critical. Middle relay has a different architecture (explicit flush loop, not copy_bidirectional) and warrants separate implementation, but must begin immediately after PR-C merges. Annotate PR-C: "Coverage: Direct relay only. Middle relay DRS planned for next release."
Q3: PR-C → PR-B dependency reversal — are you OK with reversing the order to deliver DRS first?
A3: YES, change the dependency order to PR-C → PR-B. DRS is lower-risk, higher-value, and independent of DI. This improves parallelization and reduces the critical path. Update the plan's PR Dependency Graph accordingly.
Q4:
copy_bidirectionalreplacement for IPT — is the team prepared to write a custom poll loop for PR-G Option A (direct relay)?
A4: Document as a risk item for PR-E decision gate. If PR-E chooses Option A (direct relay IPT), a custom poll loop is mandatory — copy_bidirectional is not compatible. Estimate: ~300-line custom relay loop + full test matrix. This is non-trivial. PR-E experiments should include a prototype of the custom loop to validate feasibility before committing. If the team is not prepared for ~2-3 weeks of dedicated work on the relay loop, choose Option B (middle relay only for in-session IPT).
IMPORTANT ADDENDUM: The IptController has been redesigned to be data-availability-aware (see F14). The original purely-stochastic Idle model would have broken active Telegram connections by injecting 2–30 second delays unconditionally. The redesigned controller only applies Burst delays when data is pending; idle timing is handled naturally by the caller's recv() blocking on the data channel. This simplifies the Option A custom loop (no need for tokio::time::sleep with variable durations — just a short fixed sleep in the poll loop when data is available).
Q5: Log-normal sigma — dynamic computation or fixed 0.5?
A5: Use dynamic computation (already fixed above). Parameterize so ~99% of samples fall in [floor, ceiling], with median at the geometric mean sqrt(floor*ceiling). Function: sample_lognormal_percentile_bounded(floor, ceiling, rng).
Out-of-Scope Boundaries
- No AES-NI changes: the
aescrate performs runtime CPUID detection automatically. - No sharding of
USER_PROFILESDashMap: no measured bottleneck exists. - No monolithic PRs: each item has its own branch and review cycle.
- No relaxation of red test assertions without a proven code fix — tests are the ground truth.
Critical Review — Issues Found and Fixed
This section documents all issues found during critical review of the original plan, whether they were corrected inline (above) or require explicit acknowledgement.
Fixed Inline (code/plan corrections applied above)
| # | Issue | Severity | Fix |
|---|---|---|---|
| F1 | PR-B "Blocks PR-C" contradicts D1 decision to swap ordering | Medium | PR-B header updated to "Blocks PR-D" only |
| F2 | Static line numbers wrong (handshake.rs: 71→52, 72→53, 74→55, 30→33, 32→39; middle_relay.rs: 63→62) | Low | Corrected to match actual source |
| F3 | _for_testing helper line numbers wrong across both files |
Low | Corrected to match actual source |
| F4 | handle_tls_handshake line reference 638→690; handle_mtproto_handshake 840→854; client.rs call sites wrong |
Low | Corrected |
| F5 | DrsWriter.records_completed overflow on 32-bit: wraps after ~4B records, restarts DRS ramp (detectable signature) |
High | Capped via .saturating_add(1).min(DRS_PHASE_FINAL + 1) |
| F6 | DRS TLS overhead comment assumed real TLS 1.3 (22 bytes), but FakeTlsWriter only adds 5-byte header (no AEAD, no content-type byte). Wire record = 1369 + 5 = 1374, NOT 1391 | High | Comment corrected to reflect FakeTLS overhead; constant 1369 retained as conservative value with 74-byte MSS margin |
| F7 | Log-normal median math error: mu = (ln(f) + ln(c))/2 → median = sqrt(f*c) (geometric mean), NOT (f+c)/2 (arithmetic mean) |
Critical | Test assertions and comments rewritten to assert geometric mean; function renamed to sample_lognormal_percentile_bounded |
| F8 | seed_tier_for_user TOCTOU race: drop(entry) then remove(user) can delete a fresh profile inserted between the two calls |
High | Replaced with DashMap::remove_if with TTL predicate (atomic) |
| F9 | record_user_tier eviction strategy: "evict oldest N" requires O(n log n) + double shard-locking; retain() cannot select by count |
Medium | Replaced with TTL-based retain() — single O(n) pass, removes stale entries |
| F10 | IptController Pareto idle clamp [500_000, 30_000_000]: lower bound 0.5s is dead code (Pareto minimum = scale = 2s) |
Low | Lower clamp corrected to 2_000_000 with explanatory comment |
| F11 | D3 claim "Most deployments should use direct relay where possible" is misleading — middle relay is the default when ME URLs are configured | Medium | Rewritten to accurately describe both deployment modes |
| F12 | DRS scope: Missing LOGGED_UNKNOWN_DCS and BEOBACHTEN_*_WARNED from PR-B static inventory (direct_relay.rs line 24, client.rs lines 81, 88) |
Medium | Added to PR-B table as lower-priority follow-up |
| F13 | IptController threshold approximation: P(stay) ≈ 0.95000000047 due to u32 truncation, not exactly 0.95 |
Low | Comment added documenting the approximation |
| F14 | IptController Idle state injects 2–30s delays unconditionally, breaking active Telegram connections (Telegram client timeouts) |
Critical | IptController redesigned to be data-availability-aware: next_delay_us(rng, has_pending_data). When data is pending, always returns Burst-range delay. When idle, returns 0 (caller blocks on data channel naturally). |
| F15 | Test file proxy_shared_state_isolation_tests.rs declared in TWO modules (handshake.rs AND middle_relay.rs) via #[path] — causes duplicate symbol compilation errors |
Critical | Changed to single declaration in src/proxy/mod.rs only |
| F16 | PR-F (log-normal) had artificial dependency on PR-B (DI) — zero code dependency exists; modifies only two rng.random_range() call sites |
High | Made PR-F independent; can land after PR-A only |
| F17 | New config fields drs_enabled, ipt_enabled, ipt_level lacked #[serde(default)] annotations — existing config.toml files would fail to deserialize on upgrade |
High | Added #[serde(default = "...")] annotations with helper functions |
| F18 | ProxySharedState Mutex type unspecified (std::sync vs tokio::sync) — incorrect choice causes async runtime issues |
High | Explicitly specified std::sync::Mutex with rationale (short critical sections, no await points inside locks) |
| F19 | DRS architecture note showed client_writer as "actual TLS/TCP socket" — it's actually CryptoWriter<FakeTlsWriter<W>> with internal buffering |
High | Corrected call chain diagram to show CryptoWriter + FakeTlsWriter layers with buffering interaction documentation |
| F20 | DRS DRS_FULL_RECORD_PAYLOAD = 16_384 was documented as "becomes a no-op" but FakeTlsWriter uses MAX_TLS_CIPHERTEXT_SIZE = 16_640 — DRS still shapes in steady-state |
Medium | Comment corrected; DRS at 16_384 intentionally mimics RFC 8446 plaintext limit |
| F21 | IptController burst sample: (sample as u64) * 1_000 can overflow for extreme LogNormal tail values |
Medium | Changed to (sample as u64).saturating_mul(1_000) with .max(0.0) guard for negative edge cases |
| F22 | PR-C.1 (middle relay DRS) was treated as casual follow-up but middle relay is the DEFAULT production mode | High | Elevated PR-C.1 to same priority as PR-C; must begin immediately after PR-C merges |
| F23 | #![allow(dead_code)] on adaptive_buffers.rs not planned for removal in PR-D |
Medium | Added prerequisite to PR-D: remove the attribute when call sites are added |
| F24 | PR-E experiment tests (adaptive_option_a_*, adaptive_option_b_*) are performance benchmarks that will be flaky on shared CI runners |
Medium | Added #[ignore] requirement; run only in isolated performance environments |
| F25 | rand_distr = "0.5" is incompatible with rand = "0.10" — rand_distr 0.5 depends on rand_core 0.9; trait mismatch prevents compilation |
Critical | Removed rand_distr dependency; replaced with manual log-normal via Box-Muller and manual Pareto CDF inversion. Zero new dependencies needed. |
| F26 | sample_lognormal_percentile_bounded with floor=0: floor.max(1) avoids ln(0) but silently shifts distribution center from ceiling/2 (uniform) to sqrt(ceiling) (log-normal) — massive semantic change |
High | Documented explicitly: only path 3 (floor > 0 && ceiling > floor) uses log-normal. Path 2 (floor == 0) retains uniform distribution. |
| F27 | seed_tier_for_user / record_user_tier use duration_since which panics if seen_at > now (concurrent Instant reordering in remove_if predicate) |
High | Replaced all TTL predicates with saturating_duration_since — returns Duration::ZERO when seen_at > now, treating entry as fresh (safe). |
| F28 | IptController used rand_distr::{LogNormal, Pareto} (incompatible with rand 0.10) and pre-stored distribution objects requiring expect() (denied by clippy) |
Critical | Redesigned: inline Box-Muller sampling for log-normal, manual CDF inversion for Pareto. random_bool(p) for Markov transitions. No stored objects, no expect(). |
| F29 | ipt_level: u8 config field violates Architecture.md §4 (enums over magic numbers) |
Low | Should be enum IptLevel { SingleDelay, MarkovChain } with #[serde(rename_all = "snake_case")]. |
| F30 | PR-A test_harness_common.rs declared via #[path] in three modules → triple duplicate symbol compilation failure |
Critical | Declared once in proxy/mod.rs; imported via use crate::proxy::test_harness_common::* in consuming tests |
| F31 | PR-A RecordingWriter stored Vec<Vec<u8>> with ambiguous write-vs-flush boundaries; DRS tests (PR-C) need flush-boundary tracking |
High | Dual-tracking design: writes (per poll_write) + flushed (per poll_flush boundary with accumulator) |
| F32 | PR-A SliceReader required bytes crate for no gain; tokio::io::duplex() already used everywhere |
High | Dropped from test harness |
| F33 | PR-A PendingWriter only controlled poll_write pending; DRS flush-pending tests (drs_pending_on_flush_propagates_pending_without_spurious_wake) need separate flush control |
Medium | Renamed to PendingCountWriter with separate write_pending_remaining and flush_pending_remaining counts |
| F34 | PR-A relay_baseline_watchdog_delta_does_not_panic_on_u64_wrap duplicates 7 existing tests in relay_watchdog_delta_security_tests.rs |
Critical | Dropped — existing test file already provides exhaustive coverage including wrap, overflow, fuzz |
| F35 | PR-A handshake_baseline_saturation_fires_at_configured_threshold implies runtime config but AUTH_PROBE_BACKOFF_START_FAILS is a compile-time constant |
Low | Renamed to _compile_time_threshold |
| F36 | PR-A middle_relay baseline tests directly poked global statics that PR-B removes | High | Rewritten to test through public functions (mark_relay_idle_candidate, clear_relay_idle_candidate) whose signatures survive PR-B |
| F37 | PR-A had zero masking baseline tests despite masking being the primary anti-DPI component and PR-F modifying it | High | Added masking_baseline_invariant_tests.rs with timing budget, fallback relay, consume-cap, and adversarial tests |
| F38 | PR-A had no error-path baseline tests — only happy paths locked | High | Added: simultaneous-close, broken-pipe, and many-small-writes relay baselines |
| F39 | PR-A relay_baseline_empty_transfer_completes_without_error was vague (no sharp assertions) |
Medium | Replaced with relay_baseline_zero_bytes_returns_ok_and_counters_zero |
| F40 | PR-A test_stats() and test_buffer_pool() are trivial wrappers for one-liner constructors already inlined everywhere |
Medium | Dropped from test harness to avoid unnecessary indirection |
| F41 | PR-A seeded_rng limitation not documented: cannot substitute for SecureRandom in production function calls |
Medium | Documented as explicit limitation in code comment |
| F42 | PR-A no test isolation strategy documented for auth_probe global state contention | Medium | Each handshake baseline test acquires auth_probe_test_lock(), calls clear_auth_probe_state_for_testing(). Documented as temporary coupling eliminated in PR-B |
| F43 | PR-A was not split into sub-phases; utility iteration could block baseline tests | High | Split into PR-A.1 (utilities, compile-only gate) and PR-A.2 (baseline tests, all-green gate) |
| F44 | sample_lognormal_percentile_bounded and 14 masking lognormal tests already exist in codebase (masking.rs:258, masking_lognormal_timing_security_tests.rs). PR-F describes implementing what's already done. |
High | PR-F's remaining scope: verify handshake.rs integration (already wired at line 596). PR-F may already be complete — audit needed before starting. |
| F45 | PR-A handshake_test_config() was missing; tls_only_config() alone is insufficient for handshake baseline tests requiring user/secret/masking config |
High | Added handshake_test_config(secret_hex) to test harness |
| F46 | Previous external review C1 (DRS write-chain placement "fundamentally wrong") is INCORRECT — see R3/R6 in Acknowledged Risks. Each DrsWriter.poll_write passes ≤ target bytes to CryptoWriter in one call. CryptoWriter passes through to FakeTlsWriter in one call. FakeTlsWriter creates exactly one TLS record per poll_write. Flush at record boundary ensures CryptoWriter's pending buffer is drained before the next record starts. Chain is correct. | Informational | No plan change needed; external finding was wrong. |
| F47 | BEOBACHTEN_*_WARNED statics are process-scoped log-dedup guards. Moving to ProxySharedState changes semantics: warnings fire per-instance instead of per-process. |
Medium | Keep as process-global statics (correct for log dedup). Do NOT migrate to ProxySharedState. |
| F48 | ProxySharedState nested into HandshakeSharedState + MiddleRelaySharedState — unnecessary indirection. Functions access shared.handshake.auth_probe instead of shared.auth_probe |
Low | Consider flattening to a single struct for simplicity (KISS principle, Architecture.md §1). Both sub-structs are always accessed together through the parent. |
Acknowledged Risks (not fixable in plan, require runtime attention)
| # | Risk | Mitigation |
|---|---|---|
| R1 | DRS per-record flush adds syscall overhead in steady-state (16KB records). copy_bidirectional_with_sizes also flushes independently → double-flush is idempotent but wastes cycles. |
Benchmark in PR-C red tests. If overhead > 2% throughput regression, coarsen flush to every N records in steady-state phase. |
| R2 | copy_bidirectional_with_sizes internal buffering: when DrsWriter.poll_write returns fewer bytes than offered (record boundary), the copy loop retries with the remaining buffer. This is correct but untested with the specific tokio implementation. |
Add a specific integration test drs_copy_bidirectional_partial_write_retry that verifies total data integrity when DrsWriter limits write sizes. |
| R3 | DrsWriter flush inside poll_write loop: DRS value depends on FakeTlsWriter.poll_flush actually draining its internal WriteBuffer to the socket and creating a TLS record boundary. Verified: FakeTlsWriter.poll_flush first calls poll_flush_record_inner (drains pending TLS record bytes) then upstream.poll_flush (drains socket). This IS a real record boundary. However, CryptoWriter sits between DRS and FakeTLS and has its own pending buffer. DRS flush → CryptoWriter.poll_flush (drains pending ciphertext) → FakeTlsWriter.poll_flush. If CryptoWriter has accumulated bytes from multiple DRS writes before flush (possible if earlier write returned buffered-but-Ok), those bytes may be flushed as one chunk to FakeTLS, creating one larger record instead of separate DRS-sized records. |
Add integration test drs_crypto_writer_buffering_chain_integrity to verify full chain produces individual records at DRS boundaries. |
| R4 | average_throughput_to_tier uses session-average throughput, not peak-window. Bursty traffic patterns (video streaming: 30s burst at 100 Mbps, then 9.5min idle) will underestimate tier, resulting in sub-optimal buffer sizes for the burst phase of the next session. |
Document limitation. Monitor via watchdog's 10s snapshots. Future PR: compute peak from watchdog snapshots rather than session average. |
| R5 | PR-C covers direct relay only; middle relay (often the default) has no DRS. This is a significant coverage gap for deployments using ME pools. | PR-C.1 follow-up for middle relay. Middle relay has natural flush-tick points that make DRS integration architecturally simpler. Prioritize PR-C.1 immediately after PR-C. |
| R6 | CryptoWriter.poll_write always returns Ok(to_accept) even when FakeTlsWriter returns Pending — it buffers internally. If DRS writes N bytes and CryptoWriter buffers them, then DRS flushes, CryptoWriter drains its buffer as ONE chunk to FakeTLS. FakeTLS receives the full N-byte chunk and creates one N+5 byte TLS record. This is correct behavior (one DRS record = one TLS record). BUT if CryptoWriter's max_pending_write (default 16KB) is smaller than a DRS write (impossible: max DRS write = 16384 ≤ 16KB), writes would be split. Verify CryptoWriter.max_pending_write is ≥ DRS_FULL_RECORD_PAYLOAD. |
Integration test drs_crypto_writer_buffering_chain_integrity. |
| R7 | IptController redesign (data-availability-aware) removes the Idle-state delay generation entirely. The Pareto distribution and idle_dist field are now dead code. Consider removing them to avoid confusion, or repurposing them for synthetic keep-alive timing in a future Level 3 enhancement. |
Document in PR-G that idle_dist is retained for future Level 3 (trace-driven synthetic idle traffic). |
Missing Tests (should be added to existing PR test lists)
| Test | PR | Rationale |
|---|---|---|
drs_statsio_byte_count_matches_actual_written |
PR-C | Verify StatsIo counters remain accurate when DrsWriter limits write sizes. Without this, a bug where DrsWriter eats or duplicates bytes goes undetected. |
drs_copy_bidirectional_partial_write_retry |
PR-C | Verify copy_bidirectional_with_sizes correctly retries when DrsWriter returns fewer bytes than offered at record boundaries. |
drs_records_completed_counter_does_not_wrap |
PR-C | On 32-bit usize, verify counter caps at DRS_PHASE_FINAL + 1 and does not restart the DRS ramp. |
drs_flush_is_meaningful_for_faketls |
PR-C | Verify that FakeTlsWriter.poll_flush produces a TLS record boundary, otherwise DRS provides no anti-DPI value. |
adaptive_startup_remove_if_does_not_delete_fresh_concurrent_insert |
PR-D | Concurrent test: thread A reads stale profile, thread B inserts fresh profile, thread A calls remove_if → assert fresh profile survives. |
ipt_controller_burst_stay_threshold_probability_accuracy |
PR-G | Verify empirical Burst self-transition probability is within ±0.001 of 0.95 over 10M samples. |
proxy_shared_state_logged_unknown_dcs_isolation |
PR-B | Verify LOGGED_UNKNOWN_DCS does not leak between instances (if migrated). |
ipt_controller_pending_data_forces_burst_delay |
PR-G | Verify that next_delay_us(rng, has_pending_data=true) always returns Burst-range delay even when Markov state is Idle. Critical for connection liveness. |
ipt_controller_no_pending_data_returns_zero |
PR-G | Verify that next_delay_us(rng, has_pending_data=false) returns 0, ensuring no artificial stalling when the relay is idle. |
ipt_controller_burst_sample_overflow_safe |
PR-G | Verify LogNormal extreme tail samples don't overflow saturating_mul(1_000) and are properly clamped. |
ipt_controller_idle_sample_extreme_f64_safe |
PR-G | Verify Pareto samples of f64::INFINITY or f64::NAN are safely handled by as u64 cast + clamp. |
drs_crypto_writer_buffering_chain_integrity |
PR-C | Verify that DRS → CryptoWriter (with internal pending buffer) → FakeTlsWriter produces correct TLS record boundaries. CryptoWriter may buffer; flush must drain the entire chain. |
drs_config_serde_default_upgrade_compat |
PR-C | Verify that deserializing a config.toml WITHOUT drs_enabled field produces drs_enabled=true (serde default). Tests upgrade compatibility. |