// Lone Unicode surrogate sanitization. // // Lone surrogates (\uD800-\uDFFF without a matching pair) are valid UTF-16 // but invalid UTF-8, so JSON.stringify produces output the Claude API rejects // with HTTP 400 "no low surrogate in string". Page captures from real-world // HTML hit this when content contains broken emoji bytes or mid-emoji splits. // // Two sanitizers are needed because both forms appear in browse responses: // - Raw UTF-16 surrogates in text/plain bodies (pre-stringify state). // - JSON \uXXXX escape sequences after JSON.stringify already ran. // Both replace lone surrogates with U+FFFD (replacement character). const LONE_SURROGATE_HIGH = /[\uD800-\uDBFF](?![\uDC00-\uDFFF])/g; const LONE_SURROGATE_LOW = /(?