This commit introduces a comprehensive set of improvements to enhance
the security, reliability, and configurability of the proxy server,
specifically targeting adversarial resilience and high-load concurrency.
Security & Cryptography:
- Zeroize MTProto cryptographic key material (`dec_key`, `enc_key`)
immediately after use to prevent memory leakage on early returns.
- Move TLS handshake replay tracking after full policy/ALPN validation
to prevent cache poisoning by unauthenticated probes.
- Add `proxy_protocol_trusted_cidrs` configuration to restrict PROXY
protocol headers to trusted networks, rejecting spoofed IPs.
Adversarial Resilience & DoS Mitigation:
- Implement "Tiny Frame Debt" tracking in the middle-relay to prevent
CPU exhaustion from malicious 0-byte or 1-byte frame floods.
- Add `mask_relay_max_bytes` to strictly bound unauthenticated fallback
connections, preventing the proxy from being abused as an open relay.
- Add a 5ms prefetch window (`mask_classifier_prefetch_timeout_ms`) to
correctly assemble and classify fragmented HTTP/1.1 and HTTP/2 probes
(e.g., `PRI * HTTP/2.0`) before routing them to masking heuristics.
- Prevent recursive masking loops (FD exhaustion) by verifying the mask
target is not the proxy's own listener via local interface enumeration.
Concurrency & Reliability:
- Eliminate executor waker storms during quota lock contention by replacing
the spin-waker task with inline `Sleep` and exponential backoff.
- Roll back user quota reservations (`rollback_me2c_quota_reservation`)
if a network write fails, preventing Head-of-Line (HoL) blocking from
permanently burning data quotas.
- Recover gracefully from idle-registry `Mutex` poisoning instead of
panicking, ensuring isolated thread failures do not break the proxy.
- Fix `auth_probe_scan_start_offset` modulo logic to ensure bounds safety.
Testing:
- Add extensive adversarial, timing, fuzzing, and invariant test suites
for both the client and handshake modules.
- Introduced a new test module for middle relay idle policy security tests, covering various scenarios including soft mark, hard close, and grace periods.
- Implemented functions to create crypto readers and encrypt data for testing.
- Enhanced the Stats struct to include counters for relay idle soft marks, hard closes, pressure evictions, and protocol desync closes.
- Added corresponding increment and retrieval methods for the new stats fields.
- Adjusted QUOTA_USER_LOCKS_MAX based on test and non-test configurations to improve flexibility.
- Implemented logic to retain existing locks when the maximum quota is reached, ensuring efficient memory usage.
- Added comprehensive tests for quota user lock functionality, including cache reuse, saturation behavior, and race conditions.
- Enhanced StatsIo struct to manage wake scheduling for read and write operations, preventing unnecessary self-wakes.
- Introduced separate replay checker domains for handshake and TLS to ensure isolation and prevent cross-pollution of keys.
- Added security tests for replay checker to validate domain separation and window clamping behavior.
- Implemented a mechanism to log unknown datacenter indices with a distinct limit to avoid excessive logging.
- Introduced tests to ensure that logging is deduplicated per datacenter index and respects the distinct limit.
- Updated the fallback logic for datacenter resolution to prevent panics when only a single datacenter is available.
feat(proxy): add authentication probe throttling
- Added a pre-authentication probe throttling mechanism to limit the rate of invalid TLS and MTProto handshake attempts.
- Introduced a backoff strategy for repeated failures and ensured that successful handshakes reset the failure count.
- Implemented tests to validate the behavior of the authentication probe under various conditions.
fix(proxy): ensure proper flushing of masked writes
- Added a flush operation after writing initial data to the mask writer to ensure data integrity.
refactor(proxy): optimize desynchronization deduplication
- Replaced the Mutex-based deduplication structure with a DashMap for improved concurrency and performance.
- Implemented a bounded cache for deduplication to limit memory usage and prevent stale entries from persisting.
test(proxy): enhance security tests for middle relay and handshake
- Added comprehensive tests for the middle relay and handshake processes, including scenarios for deduplication and authentication probe behavior.
- Ensured that the tests cover edge cases and validate the expected behavior of the system under load.
- Add ProxyError import and fix Result type annotation in tls.rs
- Add Arc import in stats/mod.rs test module
- Add BodyExt import in metrics.rs test module
These imports were missing causing compilation failures in
cargo test --release with 10 errors.
- Remove unused imports across multiple modules
- Add #![allow(dead_code)] for public API items preserved for future use
- Add #![allow(deprecated)] for rand::Rng::gen_range usage
- Add #![allow(unused_assignments)] in main.rs
- Add #![allow(unreachable_code)] in network/stun.rs
- Prefix unused variables with underscore (_ip_tracker, _prefer_ipv6)
- Fix unused_must_use warning in tls_front/cache.rs
This ensures clean compilation without warnings while preserving
public API items that may be used in the future.
- Fix: LruCache::get type ambiguity in stats/mod.rs
- Changed `self.cache.get(&key.into())` to `self.cache.get(key)` (key is already &[u8], resolved via Box<[u8]>: Borrow<[u8]>)
- Changed `self.cache.peek(&key)` / `.pop(&key)` to `.peek(key.as_ref())` / `.pop(key.as_ref())` (explicit &[u8] instead of &Box<[u8]>)
- Startup DC ping with RTT display and improved health-check (all DCs, RTT tracking, EMA latency, 30s interval):
- Implemented `LatencyEma` – exponential moving average (α=0.3) for RTT
- `connect()` – measures RTT of each real connection and updates EMA
- `ping_all_dcs()` – pings all 5 DCs via each upstream, returns `Vec<StartupPingResult>` with RTT or error
- `run_health_checks(prefer_ipv6)` – accepts IPv6 preference parameter, rotates DC between cycles (DC1→DC2→...→DC5→DC1...), interval reduced to 30s from 60s, failed checks now mark upstream as unhealthy after 3 consecutive fails
- `DcPingResult` / `StartupPingResult` – public structures for display
- DC Ping at startup: calls `upstream_manager.ping_all_dcs()` before accept loop, outputs table via `println!` (always visible)
- Health checks with `prefer_ipv6`: `run_health_checks(prefer_ipv6)` receives the parameter
- Exported `StartupPingResult` and `DcPingResult`
- Summary: Startup DC ping with RTT, rotational health-check with EMA latency tracking, 30-second interval, correct unhealthy marking after 3 fails.
Co-Authored-By: brekotis <93345790+brekotis@users.noreply.github.com>