The previous --urls mode was a plain grep for "https?://..." which on a
real APK produced thousands of lines, half of them junk strings extracted
from Kotlin stdlib's compression dictionary ("http://An Introduction to..."
fragments) and the other half SDK URLs (Google, Firebase, AppsFlyer,
Datadog, Sentry, ...) that the analyst is not looking for. The signal —
first-party backend hosts — was buried.
Two changes:
1. Strict URL regex: hostname must have at least one dot and end in a 2+
letter TLD, with no whitespace / angle brackets / non-printables in the
path. This eliminates the dictionary-fragment noise.
2. Bucket the surviving URLs into "likely first-party" vs "third-party"
using references/third_party_hosts.txt — a curated denylist of
~80 patterns covering Google/Firebase/Apple/Microsoft/Adobe, attribution
and observability vendors (AppsFlyer, Datadog, Sentry, Bugsnag, ...),
payments (Stripe, PayU, Adyen, ...), support/chat SDKs, CAs, and
standards namespaces (w3.org, etc.).
The new output starts with a frequency-sorted list of likely first-party
hosts — which is the artifact every reverse-engineer wants on the first
page — followed by the collapsed third-party list and the full URL set
for first-party hosts only.
The denylist is a sidecar text file (one regex per line) so users can
extend or override it without editing the script.