android-reverse-engineering.../plugins/android-reverse-engineering
Michał Tajchert 2e6fc63453 feat: bucketed --urls output with strict regex and third-party denylist
The previous --urls mode was a plain grep for "https?://..." which on a
real APK produced thousands of lines, half of them junk strings extracted
from Kotlin stdlib's compression dictionary ("http://An Introduction to..."
fragments) and the other half SDK URLs (Google, Firebase, AppsFlyer,
Datadog, Sentry, ...) that the analyst is not looking for. The signal —
first-party backend hosts — was buried.

Two changes:

1. Strict URL regex: hostname must have at least one dot and end in a 2+
   letter TLD, with no whitespace / angle brackets / non-printables in the
   path. This eliminates the dictionary-fragment noise.

2. Bucket the surviving URLs into "likely first-party" vs "third-party"
   using references/third_party_hosts.txt — a curated denylist of
   ~80 patterns covering Google/Firebase/Apple/Microsoft/Adobe, attribution
   and observability vendors (AppsFlyer, Datadog, Sentry, Bugsnag, ...),
   payments (Stripe, PayU, Adyen, ...), support/chat SDKs, CAs, and
   standards namespaces (w3.org, etc.).

The new output starts with a frequency-sorted list of likely first-party
hosts — which is the artifact every reverse-engineer wants on the first
page — followed by the collapsed third-party list and the full URL set
for first-party hosts only.

The denylist is a sidecar text file (one regex per line) so users can
extend or override it without editing the script.
2026-04-29 01:23:56 +02:00
..
.claude-plugin chore: bump plugin version to 1.1.0 2026-04-27 22:58:48 +02:00
commands Feature/windows powershell support (#8) 2026-04-27 10:14:59 +02:00
skills/android-reverse-engineering feat: bucketed --urls output with strict regex and third-party denylist 2026-04-29 01:23:56 +02:00