Compare commits

..

12 Commits

Author SHA1 Message Date
Simone Avogadro e8dde9d058 chore: bump plugin version to 1.5.0 2026-06-10 11:49:02 +02:00
Simone Avogadro f68d9ce3be feat: post-filter --urls to drop dictionary noise while keeping IPs and apex hosts
The hardening patch widened STRICT_URL to recover IPv4 literals, apex
2-label domains and internal hosts that the PR's strict-only regex
discarded as collateral while killing Kotlin-stdlib dictionary noise.
Widening alone reopened a narrow noise class: 'word.word' fragments such
as "www.this" / "this.introduction" pass as apex domains.

Keep extraction permissive and add a small awk pass that decides per host:
- IPv4 literal: always keep (dict fragments are words, never dotted-quads)
- >=3 labels: always keep (any TLD; same tolerance as the original regex)
- any host with a :port or /path: always keep (structured = high signal)
- bare 2-label apex: keep only when the TLD is a real one, matched as a
  whole field (so "introduction" != "in" — the prefix-match bug a single
  mega-regex would have)

Trade-off documented inline: a first-party host referenced bare with an
uncommon TLD (e.g. https://foo.store with no path) is dropped; a path or
port keeps it. awk is POSIX (sub/split/~/print) — more portable than the
bash>=4 'declare -A' already used in the summary header.

Verified: dictionary noise dropped; IPs, apex, internal and subdomain
hosts kept; --all on a zero-match tree still exits 0; host list and
full-URL list stay consistent (no orphan hosts).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 11:06:30 +02:00
Simone Avogadro ed97b8508b docs: document PR #16 features in README (Kotlin name recovery, fingerprint, Ktor/Apollo/Koin)
The PR #16 additions were wired into SKILL.md and references/ but the
human-facing README was never updated. Surface them, with prominent
emphasis on first-class Kotlin support:

- Top blurb: callout for R8 Kotlin name recovery + Ktor/Apollo/Koin
- "What it does" table: Phase 0 fingerprint, Kotlin name recovery,
  modern Kotlin/KMP stacks (Ktor, Apollo, Koin, HMAC)
- Usage: fingerprint.sh example, --ktor/--apollo/--paths flags, and a
  dedicated "Kotlin name recovery (R8 deobfuscation)" subsection
- Repository Structure: add the three new scripts + two new references
- Acknowledgments: credit @tajchert (#16)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 10:33:53 +02:00
Simone Avogadro 2047f99d01 fix: harden find-api-calls.sh and recover-kotlin-names.sh from PR #16 review
- find-api-calls.sh: add missing '|| true' on the --paths inventory and
  --urls extraction pipelines; with set -euo pipefail a zero-match grep
  aborted the whole script (including the default --all run) with exit 1.
- find-api-calls.sh: widen STRICT_URL to also match IPv4 literals, apex
  2-label domains and bare single-label hosts followed by :port or /path
  (localhost, internal backends) while still rejecting dictionary-fragment
  noise from the Kotlin stdlib.
- recover-kotlin-names.sh: sanitize the by_package/ filename with
  os.path.basename; a crafted absolute path in untrusted @DebugMetadata
  package names could otherwise escape the output directory.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 10:22:16 +02:00
Michał Tajchert a2a0a97f23 docs: call out BuildConfig.java and adopt a two-tier endpoint doc template
Two small changes that together meaningfully reduce wasted effort:

1. Phase 3 now explicitly tells the agent to read every BuildConfig.java.
   These files are almost never obfuscated and routinely contain the
   single highest-signal constants in the APK — base URLs, flavor names,
   build types, third-party API keys, feature flags. They were not
   mentioned in the previous workflow despite being the cheapest possible
   high-value target. One grep, finds them all.

2. The Phase 5 documentation template was a single per-endpoint block
   asking for path params, query params, request body, response type,
   and call chain. On apps with 100+ endpoints that easily becomes hours
   of work for output the consumer will not read.

   Replace it with two tiers:

     * Tier 1 — flat table covering every endpoint (host, method, path,
       auth required, source file). Always produced. Takes ~5 minutes
       from the --paths output.

     * Tier 2 — the existing detailed block, but explicitly reserved for
       high-value endpoints: the entire auth flow, payment/checkout, and
       anything the user specifically asked about. Default cap of ~10
       Tier-2 entries unless asked for more.

   This matches the natural shape of how analysts actually use this work
   (one inventory table to know the surface area, plus a deep dive on
   auth and a couple of flows) and prevents over-investment in detail
   for endpoints nobody will read about.
2026-04-29 01:40:50 +02:00
Michał Tajchert 627889a4c6 feat: add summary header to find-api-calls.sh
Without an overview the script dumps thousands of file:line: matches
across many sections, leaving the reader to figure out which framework
even applies. A short summary at the top makes the rest of the output
actionable.

The summary counts hits per framework / DI / auth-signal category in a
single grep pass over the source tree (8 separate greps would have
roughly octupled the runtime on a large decompile). Output is a 3-line
table:

  HTTP framework:   Retrofit=N OkHttp=N Ktor=N Apollo=N Volley=N
  DI framework:     Hilt/Dagger=N Koin=N
  Auth signals:     Bearer=N HMAC/Sign=N

A reader can immediately see which framework the app actually uses,
whether auth is bearer-token or signed, and whether to spend time on a
section or skip it. The summary is suppressed when a single section flag
(--retrofit, --ktor, --paths, ...) is given, so the existing single-section
workflows are unchanged.

A reminder of the available section flags is printed below the counts
so the agent does not have to consult --help.
2026-04-29 01:39:55 +02:00
Michał Tajchert ec2b14c171 feat: detect Koin DI and HMAC request-signing schemes
Two gaps in the previous coverage:

1. Koin was not mentioned anywhere — Hilt/Dagger got a full section in
   call-flow-analysis.md but Koin (the dominant DI in KMP and a large
   share of Kotlin-only Android apps) had zero patterns. Add a Koin
   subsection with the runtime-DSL patterns (module {}, single<>,
   factory<>, viewModel<>, by inject, by viewModel) plus the practical
   trick for resolving an interface to its impl after R8 obfuscation:
   intersect "files that import org.koin.core.module" with "files that
   reference the interface name".

2. The --auth mode caught Bearer / API-key / OAuth header patterns but
   missed HMAC and other request-signing schemes. A hardcoded HMAC
   secret embedded in an APK is a security finding worth surfacing —
   the same kind of authority the user gets is the same authority a
   decompiler grants to anyone. Add patterns for:

     * JCA primitives:  HmacSHA{1,256,512}, Mac.getInstance(...),
       SecretKeySpec(...), Signature.getInstance(...)
     * Header conventions: X-Signature, X-Hmac, X-Amz-Signature,
       X-Client-Authorization, AWS4-HMAC, signRequest(), signaturev2/3
     * Likely secret-bearing identifiers: app_secret, client_secret,
       signing_key, hmac_secret, consumer_secret, private_key
     * Ktor BearerTokens / loadTokens / refreshTokens DSL

These survive R8 because the JCA and Ktor APIs are public and not
shrunk. On a real-world app with a homegrown HMAC scheme they pinpoint
the signing class and its hardcoded key directly.
2026-04-29 01:26:40 +02:00
Michał Tajchert 2e6fc63453 feat: bucketed --urls output with strict regex and third-party denylist
The previous --urls mode was a plain grep for "https?://..." which on a
real APK produced thousands of lines, half of them junk strings extracted
from Kotlin stdlib's compression dictionary ("http://An Introduction to..."
fragments) and the other half SDK URLs (Google, Firebase, AppsFlyer,
Datadog, Sentry, ...) that the analyst is not looking for. The signal —
first-party backend hosts — was buried.

Two changes:

1. Strict URL regex: hostname must have at least one dot and end in a 2+
   letter TLD, with no whitespace / angle brackets / non-printables in the
   path. This eliminates the dictionary-fragment noise.

2. Bucket the surviving URLs into "likely first-party" vs "third-party"
   using references/third_party_hosts.txt — a curated denylist of
   ~80 patterns covering Google/Firebase/Apple/Microsoft/Adobe, attribution
   and observability vendors (AppsFlyer, Datadog, Sentry, Bugsnag, ...),
   payments (Stripe, PayU, Adyen, ...), support/chat SDKs, CAs, and
   standards namespaces (w3.org, etc.).

The new output starts with a frequency-sorted list of likely first-party
hosts — which is the artifact every reverse-engineer wants on the first
page — followed by the collapsed third-party list and the full URL set
for first-party hosts only.

The denylist is a sidecar text file (one regex per line) so users can
extend or override it without editing the script.
2026-04-29 01:23:56 +02:00
Michał Tajchert dbb19f0a22 feat: add --paths mode for obfuscation-resistant endpoint extraction
When R8 inlines call sites — client.get("/api/users") becomes
a.b(c, "/api/users") — the existing framework-specific patterns find
nothing, but the path string literal itself is never obfuscated. This
single observation is the most useful endpoint-extraction technique on
heavily shrunk apps; the existing --urls mode only catches full
"https://..." URLs, missing every relative path.

Add a --paths mode that greps for quoted strings matching either:

  * an absolute path with at least two slash-separated segments, or
  * a relative path beginning with a known API root keyword
    (api, v1/v2/v3, graphql, users, auth, profile, cart, order, ...)

with a {0,8}-segment cap and a small denylist for MIME types and system
paths (image/png, /proc/, /sys/, /dev/, etc.) which would otherwise pollute
results.

The output is a deduplicated inventory followed by the full call-site
list. On a real-world Kotlin/Ktor app this produced ~240 distinct API
paths in one shot — paths that the Retrofit/OkHttp/Ktor patterns missed
entirely because every call was inlined. This is the recommended first
extraction step on any obfuscated app.

Document the regex and rationale in references/api-extraction-patterns.md.
2026-04-29 01:21:25 +02:00
Michał Tajchert 371d3d4bed feat: add Ktor and Apollo (GraphQL) API-extraction patterns
The previous find-api-calls.sh covered only Retrofit, OkHttp, and Volley.
Modern Kotlin and KMP apps increasingly ship Ktor as their HTTP client
(used by ~25 % of new Kotlin apps as of 2025), and many product apps use
Apollo Kotlin for GraphQL. Both produced zero hits with the old patterns.

Add two new modes to find-api-calls.sh:

  --ktor    Ktor client calls (client.get/post/...), HttpRequestBuilder,
            defaultRequest blocks, and the Auth bearer DSL
            (BearerTokens / loadTokens / refreshTokens)

  --apollo  ApolloClient, .serverUrl(), HttpNetworkTransport, and
            .query/.mutation/.subscription operation calls

Document both in references/api-extraction-patterns.md with example
post-decompile snippets and a note on R8 obfuscation: Ktor call sites
get inlined to obfuscated method calls, but the path string literals
and Ktor library symbols (BearerTokens, URLProtocol, etc.) survive,
so library-internal patterns still work as anchors.
2026-04-29 01:16:43 +02:00
Michał Tajchert 5b63fcb418 feat: recover original Kotlin class names from R8-stripped binaries
R8 obfuscates JVM symbols but cannot strip the Kotlin metadata strings —
the Kotlin runtime needs them at runtime for reflection, coroutines, and
data-class features. The original FQNs leak through:

  * @DebugMetadata(c = "<real.fqn>")  emitted for every coroutine
    SuspendLambda (~ every suspend function in modern apps)
  * @Metadata(d2 = {"L<real/fqn>;"})  on every Kotlin class

Add scripts/recover-kotlin-names.sh that walks decompiled sources, mines
both annotations, and writes an obf -> real mapping (TSV + JSON + per-real-
package index). On a real-world Kotlin app this recovers ~100 % of
*Repository / *ViewModel / *UseCase / *Impl classes — exactly the classes
worth reading.

Add scripts/lookup-name.sh as a CLI over the mapping with four modes:
search by real-name substring, resolve obf -> real, list a real package,
and an annotated `--grep` that suffixes every hit with the owning real
class. This is a strict upgrade over plain grep against decompiled sources.

Replace the misleading 'use --deobf' tip in call-flow-analysis.md with a
pointer to this technique. --deobf only renames symbols with synthetic
placeholders; metadata recovery returns actual developer-written names.

Document the technique, expected recovery rates, and limitations in
references/kotlin-name-recovery.md, and reference it from SKILL.md as
optional Phase 3.5 (only when Phase 0 reports an obfuscated Kotlin app).
2026-04-29 01:12:31 +02:00
Michał Tajchert 213818fc27 feat: add Phase 0 fingerprint script for fast pre-decompile triage
Decompiling Java is wasted effort for Flutter, React Native, Cordova/
Capacitor, and Xamarin apps — their code lives in libapp.so, the JS bundle,
assets/www/, or .NET DLLs respectively. The previous workflow jumped
straight to Phase 1 (install deps) and Phase 2 (decompile), so the agent
had no way to know which path to take until after a full jadx run.

The new fingerprint.sh inspects an APK/XAPK in seconds and reports:

* Detected mobile framework with the file marker that triggered it
* HTTP stack hints (Retrofit, OkHttp, Ktor, Apollo, Volley) via DEX
  string scanning — survives R8 obfuscation
* DI and serialization libraries
* Obfuscation level estimate
* Notable third-party SDKs found in assets/ and DEX
* Consolidated native libraries across base + split APKs (split bundles
  often place .so files only in config.<abi>.apk)
* A framework-specific recommendation for the next step

SKILL.md documents this as Phase 0 and explicitly tells the agent to
stop and switch tooling if the app is non-native.

PowerShell port (fingerprint.ps1) intentionally not included — happy to
add if needed; behavior is straightforward to mirror.
2026-04-29 01:07:40 +02:00
12 changed files with 1176 additions and 26 deletions

View File

@ -7,14 +7,14 @@
},
"metadata": {
"description": "Claude Code plugins for Android reverse engineering",
"version": "1.1.0"
"version": "1.5.0"
},
"plugins": [
{
"name": "android-reverse-engineering",
"source": "./plugins/android-reverse-engineering",
"description": "Decompile Android APK/JAR/AAR with jadx, trace call flows through libraries, and document extracted APIs.",
"version": "1.1.0",
"version": "1.5.0",
"author": {
"name": "Simone Avogadro"
},

View File

@ -4,6 +4,8 @@
A Claude Code skill that decompiles Android APK/XAPK/JAR/AAR files and **extracts the HTTP APIs** used by the app — Retrofit endpoints, OkHttp calls, hardcoded URLs, authentication patterns — so you can document and reproduce them without the original source code.
> **First-class Kotlin support**: modern Android apps are Kotlin/KMP, heavily obfuscated with R8. This skill recovers the **original Kotlin class names** from metadata R8 cannot strip, and extracts APIs from **Ktor**, **Apollo (GraphQL)** and **Koin** — not just the classic Retrofit/OkHttp stack. See [Kotlin name recovery](#kotlin-name-recovery-r8-deobfuscation) below.
> **Windows / PowerShell support (experimental)**: The `*.ps1` scripts alongside the bash ones are a recent community contribution, still being stabilised. For any issues please open an issue on **this** repository (not on the contributors' upstream forks): the PowerShell scripts are maintained here by [@SimoneAvogadro](https://github.com/SimoneAvogadro).
## Table of Contents
@ -22,11 +24,13 @@ A Claude Code skill that decompiles Android APK/XAPK/JAR/AAR files and **extract
| Capability | Description |
|------------|-------------|
| **Fingerprint first (Phase 0)** | Triage an APK/XAPK in seconds — detect the framework (Flutter / React Native / Cordova / Xamarin / native-Kotlin), HTTP stack, obfuscation level and native libs *before* spending time on a full decompile |
| **Decompile** | APK, XAPK, JAR, and AAR files using jadx and Fernflower/Vineflower (single engine or side-by-side comparison) |
| **Extract APIs** | Retrofit endpoints, OkHttp calls, hardcoded URLs, auth headers and tokens |
| **Recover Kotlin names** | Rebuild original `*Repository` / `*ViewModel` / `*UseCase` class names from R8-obfuscated binaries using Kotlin metadata that R8 cannot strip |
| **Extract APIs** | Retrofit, OkHttp, Volley **and modern Kotlin/KMP stacks: Ktor, Apollo (GraphQL), Koin DI** — endpoints, hardcoded URLs, auth headers, tokens and HMAC request-signing schemes |
| **Trace call flows** | From Activities/Fragments through ViewModels and repositories down to HTTP calls |
| **Analyze structure** | Manifest, packages, architecture patterns |
| **Handle obfuscation** | Strategies for navigating ProGuard/R8 output |
| **Handle obfuscation** | R8-resistant path/URL extraction plus strategies for navigating ProGuard/R8 output |
## Requirements
@ -100,6 +104,10 @@ bash plugins/android-reverse-engineering/skills/android-reverse-engineering/scri
bash plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/install-dep.sh jadx
bash plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/install-dep.sh vineflower
# Fingerprint an APK/XAPK BEFORE decompiling (Phase 0 triage):
# framework, HTTP stack, obfuscation level, native libs, notable SDKs
bash plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/fingerprint.sh app.apk
# Decompile APK with jadx (default)
bash plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/decompile.sh app.apk
@ -112,10 +120,38 @@ bash plugins/android-reverse-engineering/skills/android-reverse-engineering/scri
# Run both engines and compare
bash plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/decompile.sh --engine both --deobf app.apk
# Find API calls
# Find API calls — defaults to a full scan across every supported stack
bash plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/find-api-calls.sh output/sources/
bash plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/find-api-calls.sh output/sources/ --retrofit
bash plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/find-api-calls.sh output/sources/ --urls
# Modern Kotlin/KMP stacks and obfuscation-resistant extraction
bash plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/find-api-calls.sh output/sources/ --ktor # Ktor client
bash plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/find-api-calls.sh output/sources/ --apollo # Apollo / GraphQL
bash plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/find-api-calls.sh output/sources/ --paths # quoted path literals that survive R8 inlining
```
### Kotlin name recovery (R8 deobfuscation)
Most real-world Kotlin/KMP apps ship through R8, so the decompiled classes come
out as `a.b.c`. R8 renames the JVM symbols but **cannot strip the Kotlin
metadata strings** — the Kotlin runtime (reflection, coroutines) needs the
original fully-qualified names at runtime. This skill mines those
`@DebugMetadata` / `@Metadata` annotations to rebuild an `obfuscated → real`
class-name map. On a typical app it recovers ~100 % of the
`*Repository` / `*ViewModel` / `*UseCase` / `*Impl` classes you actually want to
read.
```bash
# 1. Build the mapping from the decompiled sources
bash plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/recover-kotlin-names.sh output/sources/ output/names/
# → output/names/mapping.tsv, mapping.json, by_package/
# 2. Query it: resolve an obfuscated name, search by real name, or grep
# the sources with each hit annotated with its recovered class name
bash plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/lookup-name.sh output/names/ LoginRepository
bash plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/lookup-name.sh output/names/ -o a.b.c
bash plugins/android-reverse-engineering/skills/android-reverse-engineering/scripts/lookup-name.sh output/names/ --grep 'login' output/sources/
```
## Repository Structure
@ -130,12 +166,14 @@ android-reverse-engineering-skill/
│ │ └── plugin.json # Plugin manifest
│ ├── skills/
│ │ └── android-reverse-engineering/
│ │ ├── SKILL.md # Core workflow (5 phases)
│ │ ├── SKILL.md # Core workflow (Phase 05)
│ │ ├── references/
│ │ │ ├── setup-guide.md
│ │ │ ├── jadx-usage.md
│ │ │ ├── fernflower-usage.md
│ │ │ ├── api-extraction-patterns.md
│ │ │ ├── kotlin-name-recovery.md
│ │ │ ├── third_party_hosts.txt # denylist for first/third-party bucketing
│ │ │ └── call-flow-analysis.md
│ │ └── scripts/
│ │ ├── check-deps.sh # Bash
@ -144,6 +182,9 @@ android-reverse-engineering-skill/
│ │ ├── install-dep.ps1
│ │ ├── decompile.sh
│ │ ├── decompile.ps1
│ │ ├── fingerprint.sh # Phase 0 — pre-decompile triage
│ │ ├── recover-kotlin-names.sh # R8 → real Kotlin class names
│ │ ├── lookup-name.sh # query the recovered name map
│ │ ├── find-api-calls.sh
│ │ └── find-api-calls.ps1
│ └── commands/
@ -164,6 +205,7 @@ android-reverse-engineering-skill/
Thanks to the contributors who have shaped this skill:
- [@tajchert](https://github.com/tajchert) — Phase 0 fingerprinting, R8-resistant Kotlin name recovery (`recover-kotlin-names.sh`, `lookup-name.sh`), and Ktor / Apollo / Koin / HMAC extraction patterns (#16)
- [@philjn](https://github.com/philjn) — Native Windows / PowerShell support (`check-deps.ps1`, `install-dep.ps1`, `decompile.ps1`, `find-api-calls.ps1`) and split/bundled APK detection in `decompile.sh` (#8)
- [@txhno](https://github.com/txhno) — Migration to the maintained [`ThexXTURBOXx/dex2jar`](https://github.com/ThexXTURBOXx/dex2jar) fork (#12)
- [@muqiao215](https://github.com/muqiao215) — Decompile partial-success handling, Fernflower timeout safeguard, intermediate-artifact directory (#10)

View File

@ -1,6 +1,6 @@
{
"name": "android-reverse-engineering",
"version": "1.1.0",
"version": "1.5.0",
"description": "Decompile Android APK/JAR/AAR with jadx, trace call flows through libraries, and document extracted APIs.",
"author": {
"name": "Simone Avogadro"

View File

@ -24,6 +24,31 @@ If anything is missing, follow the installation instructions in `${CLAUDE_PLUGIN
## Workflow
### Phase 0: Fingerprint the App (recommended before anything else)
Before installing tools or decompiling, run a fast triage to determine what
kind of app you are looking at. **Decompiling Java is mostly useless for
Flutter, React Native, Cordova/Capacitor, and Xamarin apps** — the real code
lives elsewhere. The fingerprint script tells you which.
```bash
bash ${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/scripts/fingerprint.sh <file.apk|file.xapk>
```
It prints, in one screen:
- **Mobile framework** (Flutter / React Native / Cordova / Xamarin / Native Kotlin / etc.) with the file marker that triggered the verdict.
- **HTTP stack** (Retrofit, OkHttp, Ktor, Apollo, Volley) detected via DEX string scan — works even when class names are obfuscated.
- **DI / serialization** signals (Hilt, Dagger, Koin, kotlinx.serialization, Moshi, Gson, Jackson).
- **Obfuscation level** estimate based on root-level short-named packages.
- **Notable third-party SDKs** (AppsFlyer, Datadog, Sentry, Firebase, payment SDKs, support/chat SDKs, etc.).
- **Consolidated native libraries** across the base APK and all splits — XAPK split bundles often place `.so` files in `config.<abi>.apk`, not in `base.apk`.
- **Recommended next step**, which differs by framework (e.g. for Flutter the script suggests `blutter` / `strings libapp.so` rather than jadx).
If the fingerprint says the app is Flutter / RN / Cordova / Xamarin, **stop**
and switch to the framework-appropriate tooling. Phases 15 below assume a
native (Java/Kotlin) Android app.
### Phase 1: Verify and Install Dependencies
Before decompiling, confirm that the required tools are available — and install any that are missing.
@ -123,12 +148,45 @@ Navigate the decompiled output to understand the app's architecture.
- Distinguish app code from third-party libraries
- Look for packages named `api`, `network`, `data`, `repository`, `service`, `retrofit`, `http` — these are where API calls live
3. **Identify the architecture pattern**:
3. **Read every `BuildConfig.java`** — these are almost never obfuscated and frequently leak the highest-signal constants in the entire APK (base URLs, flavor names, build type, third-party API keys, feature flags):
```bash
find <output>/sources -name BuildConfig.java -exec grep -H '=' {} \;
```
Each Gradle module emits its own `BuildConfig`, so expect 1N hits. Read all of them.
4. **Identify the architecture pattern**:
- MVP: look for `Presenter` classes
- MVVM: look for `ViewModel` classes and `LiveData`/`StateFlow`
- Clean Architecture: look for `domain`, `data`, `presentation` packages
- This informs where to look for network calls in the next phases
### Phase 3.5: Recover Kotlin Class Names (only for obfuscated Kotlin apps)
If Phase 0 reported moderate / high obfuscation **and** the app is Kotlin
(Compose / kotlin_module markers detected), run the metadata recovery
script before tracing call flows. R8 obfuscates JVM symbols but cannot
strip Kotlin metadata strings, so original FQNs leak through
`@DebugMetadata` and `@Metadata.d2`.
```bash
bash ${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/scripts/recover-kotlin-names.sh \
<output>/sources <output>/mapping
```
Then use the lookup helper instead of plain grep — every hit comes
annotated with the owning class's real name:
```bash
bash ${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/scripts/lookup-name.sh \
<output>/mapping --grep '"/api/' <output>/sources
```
Typical recovery on a real-world Kotlin app: ~100% of `*Repository` /
`*ViewModel` / `*UseCase` / `*Impl` classes, ~80% of DTOs.
See `${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/references/kotlin-name-recovery.md`
for the full technique and limitations.
### Phase 4: Trace Call Flows
Follow execution paths from user-facing entry points down to network calls.
@ -190,15 +248,32 @@ On Windows (PowerShell):
& "${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/scripts/find-api-calls.ps1" <output>/sources/ -Auth
```
Then, for each discovered endpoint, read the surrounding source code to extract:
- HTTP method and path
- Base URL
- Path parameters, query parameters, request body
- Headers (especially authentication)
- Response type
- Where it's called from (the call chain from Phase 4)
Document the endpoints in **two tiers** — going deep on every endpoint is
prohibitively expensive on apps with 100+ paths, and most of them do not
warrant it. Always produce Tier 1; expand Tier 2 only for the endpoints
that matter.
**Document each endpoint** using this format:
#### Tier 1 — flat inventory (always)
A single table covering every discovered endpoint. Aim for one line each;
if you cannot determine a column, write `?`.
| Host | Method | Path | Auth | Source file |
|------|--------|------|------|-------------|
| `api.example.com` | GET | `/v1/users/profile` | Bearer | `com/example/api/UserApi.java` |
| `api.example.com` | POST | `/v1/auth/login` | none | `com/example/api/AuthApi.java` |
This table answers "what does the backend look like" in one screen and
takes ~5 minutes to produce from the `--paths` output even on a large app.
#### Tier 2 — per-endpoint detail (only for high-value endpoints)
Reserve the detailed format for the few endpoints that actually need it:
- the entire authentication flow (login, refresh, logout, OTP/SMS, anonymous, registration)
- payment / checkout / order-creation endpoints
- anything the user explicitly asked about
- anything that looked unusual during the scan (custom signing, undocumented headers, etc.)
```markdown
### `METHOD /path`
@ -213,6 +288,10 @@ Then, for each discovered endpoint, read the surrounding source code to extract:
- **Called from**: `LoginActivity → LoginViewModel → UserRepository → ApiService`
```
As a default, do not produce Tier 2 entries for more than ~10 endpoints
unless the user explicitly asks for more — Tier 1 plus a Tier 2 deep dive
on auth + 1-2 key flows is what most consumers of this work actually want.
See `${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/references/api-extraction-patterns.md` for library-specific search patterns and the full documentation template.
## Output

View File

@ -55,6 +55,65 @@ grep -rn 'Interceptor\|addInterceptor\|addNetworkInterceptor\|intercept(' source
grep -rn '\.execute()\|\.enqueue(' sources/
```
## Ktor (Kotlin)
Ktor is the dominant HTTP client in Kotlin Multiplatform and modern
Kotlin-only Android apps. Unlike Retrofit, Ktor does **not** use annotations
to declare endpoints — paths appear as plain string arguments to
`client.get(...)` / `client.post(...)`, often inside an extension function.
```bash
# Calls
grep -rn '\b\(client\|httpClient\|HttpClient\)\.\(get\|post\|put\|delete\|patch\|head\|request\)\s*[<(]' sources/
# Default request / base URL configuration
grep -rn 'HttpRequestBuilder\|defaultRequest\s*{\|\burl\s*(\s*"\|URLBuilder' sources/
# Auth plugin (bearer / refresh)
grep -rn '\bbearer\s*{\|BearerTokens\s*(\|loadTokens\s*{\|refreshTokens\s*{' sources/
```
Typical Ktor call (after decompile):
```java
client.get("api/v1/users/profile") {
parameter("locale", "en-US");
}
```
The base URL is usually applied via `defaultRequest { url { host = "..." } }`
in the client builder. Search for `host =` and `URLProtocol.HTTPS` references
to pin it down.
**Note on obfuscation:** in heavily R8-shrunk apps the call site
`client.get("path")` is inlined to something like `aVar.a(dVar, "path")`
and the `client.<verb>(` regex misses it. The path string itself is **not**
obfuscated, however — fall back to the generic path-literal search
(`--paths`) for the endpoint inventory in those cases. Ktor library
internals (`BearerTokens`, `loadTokens`, `refreshTokens`, `URLProtocol`)
remain searchable because Ktor keeps these on its public API.
Ktor's authentication plugin uses the
[`Auth { bearer { loadTokens { ... }; refreshTokens { ... } } }`](https://ktor.io/docs/auth.html)
DSL — bearer access tokens with automatic refresh. After R8, the DSL
lambdas appear as `Function2`/`Function3` impls referencing
`BearerTokens(...)` calls.
## Apollo Kotlin (GraphQL)
```bash
# Client setup
grep -rn 'ApolloClient\|\.serverUrl(\|HttpNetworkTransport' sources/
# Operations (queries / mutations / subscriptions)
grep -rn '\.query(\s*[A-Z]\|\.mutation(\s*[A-Z]\|\.subscription(\s*[A-Z]' sources/
```
Apollo generates one class per operation under a generated package; once you
find the GraphQL endpoint URL via `ApolloClient.serverUrl("...")`, use the
operation classes themselves as the API documentation — each carries its
GraphQL document text in `OPERATION_DOCUMENT`.
## Volley
```bash
@ -77,6 +136,25 @@ grep -rn 'loadUrl\|evaluateJavascript\|addJavascriptInterface\|WebViewClient\|sh
WebView-based apps may load API endpoints via JavaScript bridges. Look for `@JavascriptInterface` annotated methods.
## Endpoint-Shaped Path Literals (obfuscation-resistant)
When the HTTP client cannot be identified (custom abstraction, heavy
inlining, KMP shared module), or the call sites are obfuscated to
`a.b(c, "path")`, fall back to extracting the path string literals
themselves. R8 does not obfuscate string contents, so paths leak through.
```bash
# All quoted strings shaped like an API path, deduplicated
grep -rhoE '"(/[A-Za-z0-9_{}.\-]+(/[A-Za-z0-9_{}.\-]+)+/?|(api|v[0-9]+|graphql|users?|account|auth|sso|oauth|profile|cart|basket|order|product|inventory|search|category|address|location|delivery|payment|invoice|favo[u]?rites?)(/[A-Za-z0-9_{}.\-]+)+/?)"' sources/ \
| grep -Ev '^"(image|video|audio|text|application|content)/|^"/(proc|sys|dev|tmp|etc)/' \
| sort -u
```
The skill ships this as `find-api-calls.sh --paths`, which prints both a
deduplicated inventory and the full list of call sites. On real-world
Kotlin apps this single command typically produces 100300 distinct
endpoint paths, which is the most useful first artifact for documentation.
## Hardcoded URLs and Secrets
```bash

View File

@ -84,9 +84,9 @@ Look for:
- Firebase/analytics initialization
- Base URL configuration
## 5. Dependency Injection (Dagger / Hilt)
## 5. Dependency Injection
Modern Android apps use DI. Trace bindings to find implementations:
### Dagger / Hilt
```bash
# Hilt modules
@ -102,10 +102,43 @@ grep -rn '@Component\|@Subcomponent' sources/
grep -rn '@Inject' sources/
```
To trace a call flow through DI:
1. Find where an interface is used (e.g., `ApiService` injected into a repository)
2. Find the `@Provides` or `@Binds` method that creates the implementation
3. Follow the implementation to the actual HTTP call
### Koin
Koin is the dominant DI framework in Kotlin Multiplatform and a large
share of Kotlin-only Android apps. It uses a runtime DSL rather than
compile-time generated factories, so the search patterns are different:
```bash
# Confirm Koin is actually wired up
grep -rn 'org\.koin\.' sources/
# DI module declarations
grep -rn 'fun [A-Za-z]\+Module\|module\s*{\|module(' sources/
# Bindings inside a module DSL
grep -rn 'single\s*[<{(]\|factory\s*[<{(]\|viewModel\s*[<{(]\|scoped\s*[<{(]\|singleOf\|factoryOf' sources/
# Resolution call-sites (where a binding is consumed)
grep -rn '\bget\s*<\|\binject\s*<\|by\s\+inject\b\|by\s\+viewModel\b\|getKoin' sources/
```
After R8, every binding lambda becomes an anonymous
`Function2<Scope, ParametersHolder, T>` impl. To find the binding for an
interface `Foo`, look for files that contain both a Koin import / module
DSL marker and a reference to `Foo`:
```bash
grep -rln 'org\.koin\.core\.module' sources/ | xargs grep -l 'Foo'
```
### Trace through DI
1. Find where an interface is used (e.g. `ApiService` injected into a
repository).
2. Find the `@Provides` / `@Binds` method (Hilt) **or** the
`single { ... }` / `factory { ... }` block (Koin) that creates the
implementation.
3. Follow the implementation to the actual HTTP call.
## 6. Find Constants and Configuration
@ -145,8 +178,9 @@ When code is obfuscated (ProGuard/R8):
1. **Start from strings**: Search for URLs, error messages, and known constants
2. **Start from framework classes**: Activities and Fragments are named in the manifest
3. **Follow library calls**: Retrofit `@GET`/`@POST` annotations are readable even when the interface class name is obfuscated
4. **Use `--deobf`**: jadx can generate readable replacement names
4. **Recover original Kotlin names from metadata**: `@DebugMetadata` and `@Metadata.d2` strings preserve the original FQNs even after R8 obfuscation. Run `scripts/recover-kotlin-names.sh` to build an `obf -> real` map (typically recovers 30-50% of classes — and almost 100% of `*Repository` / `*ViewModel` / `*Impl`). See [`kotlin-name-recovery.md`](./kotlin-name-recovery.md). This is the single highest-leverage step on any Kotlin app.
5. **Cross-reference**: If `class a` calls `Retrofit.create(b.class)`, then `b` is a Retrofit service interface
6. **`--deobf` is rarely enough on its own**: jadx's `--deobf` renames obfuscated symbols with synthetic placeholders (`p001a`, `C0123Foo`) — useful for disambiguation but it does **not** recover original names. Pair it with the metadata recovery above.
## 8. Tracing a Complete Call Flow: Example

View File

@ -0,0 +1,108 @@
# Recovering Original Class Names from Kotlin Metadata
When R8/ProGuard obfuscates a Kotlin app, JVM symbols are renamed but the
**Kotlin metadata strings cannot be stripped** — the Kotlin runtime depends
on them at runtime for reflection, coroutines, and `data class` features.
Two annotations leak the original fully-qualified names:
## `@DebugMetadata`
Generated for nearly every Kotlin coroutine `SuspendLambda` (i.e. almost
every `suspend` function in a modern app):
```java
@DebugMetadata(
c = "com.example.feature.account.AccountRepositoryImpl$fetch$1",
f = "AccountRepositoryImpl.kt",
l = {42, 51},
m = "invokeSuspend"
)
public final class a extends SuspendLambda implements Function2<...> { ... }
```
The `c =` field carries the original outer class FQN (with a `$` suffix
for inner / lambda scopes — strip everything after the first `$` to get the
declaring class).
## `@Metadata.d2`
Every Kotlin class carries a top-level `@Metadata` annotation. The `d2`
array lists internal class refs in JVM type-descriptor format
(`Lcom/example/Foo;`):
```java
@Metadata(d1 = {"..."},
d2 = {"...","Lcom/example/feature/account/AccountRepositoryImpl;","..."})
public final class b implements ... { ... }
```
The first non-stdlib descriptor in `d2` is usually the file's primary
class.
## How to mine them
The skill ships two scripts:
```bash
# Build a mapping from a decompiled sources directory:
bash scripts/recover-kotlin-names.sh <output>/sources [mapping-dir]
# Outputs:
# <mapping-dir>/mapping.tsv obf_fqn real_fqn file
# <mapping-dir>/mapping.json same data, JSON
# <mapping-dir>/by_package/ per-real-package index files
# Query the mapping:
bash scripts/lookup-name.sh <mapping-dir> Repository # search
bash scripts/lookup-name.sh <mapping-dir> -o ab.cd # obf -> real
bash scripts/lookup-name.sh <mapping-dir> -p com.example.feature # list package
bash scripts/lookup-name.sh <mapping-dir> --grep '"api/' <output>/sources
# ^ greps decompiled code and appends '// real.fqn' to each hit
```
## What you typically recover
On a real-world obfuscated Kotlin app the script recovers **30 50 % of
classes** — but more importantly, **almost 100 % of the classes you
actually want to read**:
| Class kind | Recovery rate |
|---------------------------|---------------|
| `*Repository` / `*Impl` | ~100 % |
| `*ViewModel` | ~100 % |
| `*UseCase` / `*Interactor`| ~100 % |
| Plain `data class` DTOs | ~80 % |
| Pure-Java helper classes | low (no Kotlin metadata) |
| Anonymous inner classes | sometimes recovered as the parent FQN |
## Why `jadx --deobf` is not enough
`--deobf` renames obfuscated identifiers using internal heuristics, but the
output is still synthetic (`p001a`, `C0123Foo`). It does **not** recover
the *original* names. Kotlin metadata recovery is the only reliable way to
map back to the names the developer actually wrote, and it costs essentially
nothing — just a regex pass over the decompiled sources.
Run both: `--deobf` for fields/methods that have no metadata source, plus
the recovery script for class names.
## Limitations
- **Method names and field names** are not recovered. Kotlin metadata only
preserves class-level FQNs and a few signatures. For method names you
still need jadx-gui's interactive rename or pattern inference.
- **Pure-Java classes** carry no `@Metadata`, so they remain obfuscated.
- **Heavily inlined classes** (`@JvmInline value class`, top-level fun
files compiled into shared `*Kt.class` synthetic classes) sometimes show
up under the wrong filename — treat results as a strong hint, not gospel.
## Reading flow with the mapping
1. Run `recover-kotlin-names.sh` once after decompiling.
2. Use `lookup-name.sh --grep '<pattern>' <sources>` instead of plain `grep`
so every hit comes annotated with the real owning class.
3. When you hit an obfuscated FQN in code (e.g. `nq.e`), resolve it with
`lookup-name.sh <mapping-dir> -o nq.e` — you will often see siblings
(`nq.d`, `nq.f`, ...) that are the same class's split lambdas/inner
classes, which is useful context.

View File

@ -0,0 +1,122 @@
# Third-party host denylist used by find-api-calls.sh --urls.
#
# Patterns are extended-regex hostname suffixes / fragments. A host is
# considered "third-party noise" if any pattern below matches anywhere
# in the hostname. Lines starting with '#' and blank lines are ignored.
#
# This list is intentionally conservative: when a pattern would hide a
# legitimate first-party host (e.g. an app may run its own *.s3.amazonaws.com
# bucket), keep the pattern but expect manual review of the bucketed output.
# Google / Firebase / Play / Crashlytics
\.googleapis\.com$
\.google\.com$
\.gstatic\.com$
\.googleusercontent\.com$
\.googletagmanager\.com$
\.googlesyndication\.com$
\.firebaseio\.com$
\.firebaseapp\.com$
\.firebaseinstallations\.googleapis\.com$
\.firebaseremoteconfig\.googleapis\.com$
\.crashlytics\.com$
\.app-measurement\.com$
# Apple / Microsoft / Adobe
\.apple\.com$
\.icloud\.com$
\.microsoft\.com$
\.live\.com$
\.office\.com$
\.adobe\.com$
ns\.adobe\.com
# Meta
\.facebook\.com$
\.fbcdn\.net$
\.instagram\.com$
\.whatsapp\.com$
# Other social / messaging / video
\.twitter\.com$
\.x\.com$
\.tiktok\.com$
\.youtube\.com$
\.youtu\.be$
\.linkedin\.com$
\.snapchat\.com$
\.pinterest\.com$
\.reddit\.com$
# Mobile attribution / analytics / observability
\.appsflyersdk\.com$
\.appsflyer\.com$
\.adjust\.com$
\.branch\.io$
\.amplitude\.com$
\.segment\.com$
\.mixpanel\.com$
\.hotjar\.com$
\.clarity\.ms$
\.datadoghq\.(com|eu|us)$
\.sentry\.io$
\.bugsnag\.com$
\.newrelic\.com$
\.instabug\.com$
\.embrace\.io$
\.rollout\.io$
\.launchdarkly\.com$
# Push / notifications
\.onesignal\.com$
\.urbanairship\.com$
\.airship\.com$
# Support / chat
\.zendesk\.com$
\.intercom\.io$
\.intercomcdn\.com$
\.helpshift\.com$
\.salesforce\.com$
\.freshchat\.com$
\.kustomerapp\.com$
# Payments
\.stripe\.com$
\.braintreepayments\.com$
\.braintreegateway\.com$
\.payu\.com$
\.payu\.in$
\.paypal\.com$
\.adyen\.com$
\.checkout\.com$
\.klarna\.com$
# Maps / location
\.mapbox\.com$
\.openstreetmap\.org$
# Storage / CDN (often third-party even when the bucket name is app-specific)
\.s3\.amazonaws\.com$
\.cloudfront\.net$
\.akamaihd\.net$
\.akamaized\.net$
\.fastly\.net$
\.cloudflare\.com$
\.azureedge\.net$
# DNS / well-known infra
\.localhost$
^localhost
^127\.
# Standards / RFCs / placeholders that show up as XML/XMP namespaces
\.w3\.org$
\.w3c\.org$
example\.(com|org|net)$
# Certificate authorities
\.sectigo\.com$
\.entrust\.com$
\.digicert\.com$
\.letsencrypt\.org$

View File

@ -14,8 +14,12 @@ Arguments:
Options:
--retrofit Search only for Retrofit annotations
--okhttp Search only for OkHttp patterns
--ktor Search only for Ktor client patterns
--apollo Search only for Apollo (GraphQL) patterns
--volley Search only for Volley patterns
--urls Search only for hardcoded URLs
--paths Extract unique endpoint-shaped path string literals
(works on heavily obfuscated apps where call sites are inlined)
--auth Search only for auth-related patterns
--all Search all patterns (default)
-h, --help Show this help message
@ -29,8 +33,11 @@ EOF
SOURCE_DIR=""
SEARCH_RETROFIT=false
SEARCH_OKHTTP=false
SEARCH_KTOR=false
SEARCH_APOLLO=false
SEARCH_VOLLEY=false
SEARCH_URLS=false
SEARCH_PATHS=false
SEARCH_AUTH=false
SEARCH_ALL=true
@ -38,8 +45,11 @@ while [[ $# -gt 0 ]]; do
case "$1" in
--retrofit) SEARCH_RETROFIT=true; SEARCH_ALL=false; shift ;;
--okhttp) SEARCH_OKHTTP=true; SEARCH_ALL=false; shift ;;
--ktor) SEARCH_KTOR=true; SEARCH_ALL=false; shift ;;
--apollo) SEARCH_APOLLO=true; SEARCH_ALL=false; shift ;;
--volley) SEARCH_VOLLEY=true; SEARCH_ALL=false; shift ;;
--urls) SEARCH_URLS=true; SEARCH_ALL=false; shift ;;
--paths) SEARCH_PATHS=true; SEARCH_ALL=false; shift ;;
--auth) SEARCH_AUTH=true; SEARCH_ALL=false; shift ;;
--all) SEARCH_ALL=true; shift ;;
-h|--help) usage ;;
@ -72,6 +82,58 @@ run_grep() {
grep $GREP_OPTS -E "$pattern" "$SOURCE_DIR" 2>/dev/null || true
}
# Print a one-screen summary FIRST so a reader knows what to expect from
# the long output that follows. Skipped when a single section flag was
# requested (the user wants raw matches, not an overview). One pass over
# the tree, counts bucketed by tag — running 8 separate greps was too slow.
if [[ "$SEARCH_ALL" == true ]]; then
section "Summary (counted in a single pass)"
declare -A H=(
[retrofit]=0 [okhttp]=0 [ktor]=0 [apollo]=0 [volley]=0
[hilt]=0 [koin]=0 [bearer]=0 [hmac]=0
)
while IFS= read -r line; do
case "$line" in
*"@GET("*|*"@POST("*|*"@PUT("*|*"@DELETE("*|*"@PATCH("*|*"@HTTP("*) H[retrofit]=$((H[retrofit]+1));;
esac
case "$line" in
*"Request.Builder"*|*"HttpUrl"*|*".newCall("*) H[okhttp]=$((H[okhttp]+1));;
esac
case "$line" in
*"BearerTokens"*|*"defaultRequest {"*|*"client.get("*|*"client.post("*|*"httpClient.get("*|*"httpClient.post("*|*"HttpClient.get("*) H[ktor]=$((H[ktor]+1));;
esac
case "$line" in
*"ApolloClient"*|*".serverUrl("*) H[apollo]=$((H[apollo]+1));;
esac
case "$line" in
*"StringRequest"*|*"JsonObjectRequest"*|*"RequestQueue"*) H[volley]=$((H[volley]+1));;
esac
case "$line" in
*"@HiltAndroidApp"*|*"@AndroidEntryPoint"*|*"@HiltViewModel"*|*"@Provides"*|*"@Binds"*) H[hilt]=$((H[hilt]+1));;
esac
case "$line" in
*"org.koin."*|*"module {"*|*"single<"*|*"factory<"*|*"singleOf("*|*"factoryOf("*) H[koin]=$((H[koin]+1));;
esac
case "$line" in
*'"Bearer '*|*'"bearer '*|*"BearerTokens"*) H[bearer]=$((H[bearer]+1));;
esac
case "$line" in
*"HmacSHA"*|*'Mac.getInstance("Hmac'*) H[hmac]=$((H[hmac]+1));;
esac
done < <(grep -rEh --include='*.java' --include='*.kt' \
'@(GET|POST|PUT|DELETE|PATCH|HTTP)\(|Request\.Builder|HttpUrl|\.newCall\(|BearerTokens|defaultRequest \{|client\.(get|post)\(|httpClient\.(get|post)\(|ApolloClient|\.serverUrl\(|StringRequest|JsonObjectRequest|RequestQueue|@HiltAndroidApp|@AndroidEntryPoint|@HiltViewModel|@Provides|@Binds|org\.koin\.|module \{|single<|factory<|"[Bb]earer |HmacSHA|Mac\.getInstance' \
"$SOURCE_DIR" 2>/dev/null || true)
printf ' HTTP framework: Retrofit=%-5s OkHttp=%-5s Ktor=%-5s Apollo=%-5s Volley=%-5s\n' \
"${H[retrofit]}" "${H[okhttp]}" "${H[ktor]}" "${H[apollo]}" "${H[volley]}"
printf ' DI framework: Hilt/Dagger=%-5s Koin=%-5s\n' \
"${H[hilt]}" "${H[koin]}"
printf ' Auth signals: Bearer=%-5s HMAC/Sign=%-5s\n' \
"${H[bearer]}" "${H[hmac]}"
echo
echo " Run with one of --retrofit / --okhttp / --ktor / --apollo / --volley /"
echo " --paths / --urls / --auth to inspect a single section."
fi
# --- Retrofit ---
if [[ "$SEARCH_ALL" == true || "$SEARCH_RETROFIT" == true ]]; then
section "Retrofit Annotations"
@ -90,16 +152,157 @@ if [[ "$SEARCH_ALL" == true || "$SEARCH_OKHTTP" == true ]]; then
run_grep '(\.url\s*\(|\.addQueryParameter|\.addPathSegment|\.scheme\s*\(|\.host\s*\()'
fi
# --- Ktor (Kotlin) ---
# Ktor doesn't use annotations. Endpoints appear as string args to
# client.get/post/etc., or are built via HttpRequestBuilder.url(...). Auth
# is configured via the bearer { loadTokens / refreshTokens } DSL.
if [[ "$SEARCH_ALL" == true || "$SEARCH_KTOR" == true ]]; then
section "Ktor — Client Calls"
run_grep '\b(client|httpClient|HttpClient)\.(get|post|put|delete|patch|head|request)\s*[<(]'
section "Ktor — Request Building / Default Request"
run_grep '(HttpRequestBuilder|defaultRequest\s*\{|\burl\s*\(\s*"|URLBuilder|URLProtocol)'
section "Ktor — Auth Plugin (Bearer / Refresh)"
run_grep '(\bbearer\s*\{|BearerTokens\s*\(|loadTokens\s*\{|refreshTokens\s*\{|\bAuth\s*\)\s*\{)'
fi
# --- Apollo (GraphQL) ---
if [[ "$SEARCH_ALL" == true || "$SEARCH_APOLLO" == true ]]; then
section "Apollo — GraphQL Client"
run_grep '(ApolloClient|\.serverUrl\s*\(|\.subscriptionNetworkTransport|HttpNetworkTransport)'
section "Apollo — Operations"
run_grep '(\.query\s*\(\s*[A-Z]|\.mutation\s*\(\s*[A-Z]|\.subscription\s*\(\s*[A-Z])'
fi
# --- Volley ---
if [[ "$SEARCH_ALL" == true || "$SEARCH_VOLLEY" == true ]]; then
section "Volley Requests"
run_grep '(StringRequest|JsonObjectRequest|JsonArrayRequest|ImageRequest|RequestQueue|Volley\.newRequestQueue)'
fi
# --- Endpoint-shaped path literals ---
# Survives R8 obfuscation: even when call sites are inlined to a.b(c, "path"),
# the path strings themselves are not obfuscated. This produces a deduplicated
# inventory of likely API endpoints that other modes miss.
if [[ "$SEARCH_ALL" == true || "$SEARCH_PATHS" == true ]]; then
section "Endpoint-Shaped Path Literals (deduplicated)"
# Quoted strings that begin with /<segment> or <segment>/ where the leading
# segment is a typical API root word. Cap segment count and length to keep
# the regex grounded.
# An endpoint-shaped string is one of:
# "/seg/seg..." — absolute path with >= 2 segments
# "api-root/seg/seg..." — relative path starting with a known
# API root keyword and containing >= 1
# '/' followed by another segment
# Segments are URL-safe chars plus {} for path-template placeholders.
SEG='[A-Za-z0-9_{}.\-]+'
ROOT='(api|v[0-9]+|graphql|rest|mobile|auth|oauth|sso|users?|account|session|token|register|signup|signin|logout|password|verify|otp|sms|profile|customer|cart|basket|order|checkout|payment|invoice|product|catalog|inventory|search|category|favo[u]?rites?|wishlist|address|location|delivery|shipping|review|feedback|notification|push|message|chat|track|event|stat[a-z]*|metric|config|settings?|feature|flag|banner|content|media|upload|download|file|image|video|live|stream|webhook|callback)'
PATHS_REGEX="\"(/${SEG}(/${SEG})+/?|${ROOT}(/${SEG})+/?)\""
# Filter out frequent false positives (MIME types, /proc, /sys, /dev).
EXCLUDE='^"(image|video|audio|text|application|content|font|model|multipart|message)/|^"/(proc|sys|dev|tmp|etc|usr|var|opt)/'
# Print a flat unique list rather than file:line — this is the inventory.
grep -rhoE --include='*.java' --include='*.kt' "$PATHS_REGEX" "$SOURCE_DIR" 2>/dev/null \
| grep -Ev "$EXCLUDE" \
| sort -u || true
echo
section "Endpoint-Shaped Path Literals — call sites"
grep $GREP_OPTS -E "$PATHS_REGEX" "$SOURCE_DIR" 2>/dev/null \
| grep -Ev ":[0-9]+:.*${EXCLUDE#^}" || true
fi
# --- Hardcoded URLs ---
# A loose grep for http(s)://... drowns in compression-dictionary garbage and
# in third-party SDK URLs (Google, Firebase, AppsFlyer, Datadog, ...). The
# strict regex requires a syntactically valid hostname and rejects strings
# containing whitespace, angle brackets, or non-printable bytes. Hosts are
# then bucketed into "first-party candidates" vs "third-party (denylist)".
if [[ "$SEARCH_ALL" == true || "$SEARCH_URLS" == true ]]; then
section "Hardcoded URLs (http:// and https://)"
run_grep '"https?://[^"]+'
HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
DENYLIST="$HERE/../references/third_party_hosts.txt"
# Accept three host shapes, all rejecting whitespace / angle brackets /
# non-printables in the path:
# * IPv4 literal (dev/staging endpoints, high signal) 192.168.0.1
# * dotted host: >=2 labels ending in a 2+ letter TLD (incl apex) example.com
# * bare single-label host, BUT only when followed by ':port' or localhost:3000
# '/path' — keeps internal hosts (localhost, internal-backend) svc/health
# while still dropping Kotlin-stdlib dictionary fragments like
# "http://An Introduction..." (bare word, no port/path follows).
STRICT_URL='https?://(([0-9]{1,3}(\.[0-9]{1,3}){3}|[A-Za-z0-9-]+(\.[A-Za-z0-9-]+)*\.[A-Za-z]{2,})(:[0-9]{1,5})?(/[^"<>[:space:]]*)?|[A-Za-z0-9-]+(:[0-9]{1,5}(/[^"<>[:space:]]*)?|/[^"<>[:space:]]*))'
TMP="$(mktemp)"
trap 'rm -f "$TMP"' EXIT
# Extraction (STRICT_URL) is deliberately permissive; this awk pass drops the
# residual Kotlin-stdlib dictionary noise WITHOUT losing the high-signal
# shapes a strict-only regex discards (IPs, apex domains, internal hosts).
# Decision table, top-down, on the host (authority before any :port / /path):
# * IPv4 literal -> keep (dict fragments are words,
# never dotted-quads)
# * >=3 labels (sub.domain.tld) -> keep (any TLD; same tolerance the
# original strict regex had)
# * any host WITH a :port or /path -> keep (structured = high signal:
# localhost:3000, svc/health)
# * bare 2-label apex, no port/path -> keep ONLY if the TLD is a real one,
# compared as a whole field (kills
# "www.this" / "this.introduction",
# keeps "mytrackera-api.com")
# Trade-off: a first-party host referenced bare with an uncommon TLD (e.g.
# https://foo.store with no path) is dropped — give it a path/port, or add the
# TLD to the list below, if you hit that case.
{ grep -rhoE --include='*.java' --include='*.kt' "$STRICT_URL" "$SOURCE_DIR" 2>/dev/null || true; } \
| sort -u \
| awk '
{ rest=$0; sub(/^https?:\/\//,"",rest)
host=rest; sub(/[/:].*/,"",host)
haspathport = (rest ~ /[/:]/)
if (host ~ /^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$/) { print; next } # IPv4
n = split(host, a, ".")
if (n >= 3) { print; next } # sub.domain.tld
if (haspathport) { print; next } # has :port or /path
if (n == 2 && a[2] ~ /^(com|net|org|io|co|app|dev|me|ai|xyz|info|biz|gov|edu|mil|int|tech|cloud|uk|de|fr|it|es|nl|in|us|ca|au|jp|cn|br|ru|eu|ch|se|no|fi|dk|pl|pt|gr|ie|be|at|cz|sg|hk|kr|tw|mx|ar|cl|za|nz)$/) print # real apex TLD
}' > "$TMP"
# Extract host: strip scheme, take part up to first ':' or '/'.
HOSTS_TMP="$(mktemp)"
sed -E 's#^https?://##; s#[/:].*$##' "$TMP" | sort -u > "$HOSTS_TMP"
if [[ -f "$DENYLIST" ]]; then
# Build a single combined regex from the denylist (one line each).
DENY_REGEX="$(grep -vE '^\s*(#|$)' "$DENYLIST" | tr '\n' '|' | sed 's/|$//')"
THIRD_HOSTS=$(grep -E "$DENY_REGEX" "$HOSTS_TMP" || true)
FIRST_HOSTS=$(grep -vE "$DENY_REGEX" "$HOSTS_TMP" || true)
else
THIRD_HOSTS=""
FIRST_HOSTS=$(cat "$HOSTS_TMP")
fi
section "Likely First-Party Hosts (frequency-sorted)"
if [[ -n "$FIRST_HOSTS" ]]; then
while IFS= read -r h; do
[[ -z "$h" ]] && continue
n=$(grep -cE "://${h//./\\.}([/:\"]|$)" "$TMP" || true)
printf ' %5d %s\n' "$n" "$h"
done <<< "$FIRST_HOSTS" | sort -rn -k1
else
echo " (none — every URL matched the third-party denylist)"
fi
section "Third-Party Hosts (denylist matches, collapsed)"
if [[ -n "$THIRD_HOSTS" ]]; then
echo "$THIRD_HOSTS" | sed 's/^/ /'
else
echo " (none)"
fi
section "All First-Party URLs (full strings)"
if [[ -n "$FIRST_HOSTS" ]]; then
while IFS= read -r h; do
[[ -z "$h" ]] && continue
grep -E "://${h//./\\.}([/:\"]|$)" "$TMP" | sed 's/^/ /'
done <<< "$FIRST_HOSTS"
fi
rm -f "$HOSTS_TMP" "$TMP"
trap - EXIT
section "HttpURLConnection"
run_grep '(openConnection|setRequestMethod|HttpURLConnection|HttpsURLConnection)'
section "WebView URLs"
@ -109,9 +312,27 @@ fi
# --- Auth patterns ---
if [[ "$SEARCH_ALL" == true || "$SEARCH_AUTH" == true ]]; then
section "Authentication & API Keys"
run_grep -i '(api[_-]?key|auth[_-]?token|bearer|authorization|x-api-key|client[_-]?secret|access[_-]?token)'
run_grep -i '(api[_-]?key|auth[_-]?token|bearer|authorization|x-api-key|client[_-]?secret|access[_-]?token|refresh[_-]?token)'
# Request-signing schemes: a hardcoded HMAC / RSA secret in an APK is a
# security finding worth surfacing prominently. These patterns catch the
# common shapes of homegrown / SDK-issued request signers.
section "Request Signing (HMAC / signature schemes)"
run_grep '(HmacSHA(1|256|512)|Mac\.getInstance\("Hmac|SecretKeySpec\(|Signature\.getInstance\()'
run_grep -i '(x-signature|x-client-authorization|x-amz-signature|x-hmac|aws4-hmac|signRequest|signatureFor|computeSignature|signaturev[0-9])'
# Hardcoded high-entropy strings adjacent to "secret"/"key" assignments
# are the canonical leaked-credential pattern.
section "Possible Hardcoded Secrets / Keys"
run_grep -i '(app[_-]?secret|client[_-]?secret|signing[_-]?key|hmac[_-]?secret|consumer[_-]?secret|private[_-]?key)'
section "Base URLs and Constants"
run_grep -i '(BASE_URL|API_URL|SERVER_URL|ENDPOINT|API_BASE|HOST_NAME)'
# Ktor BearerTokens / refresh DSL — common on Kotlin apps and lives on
# Ktor's public API, so it survives R8 unchanged.
section "Ktor Auth (Bearer + Refresh)"
run_grep '(BearerTokens|loadTokens\s*\{|refreshTokens\s*\{|\bbearer\s*\{)'
fi
echo

View File

@ -0,0 +1,241 @@
#!/usr/bin/env bash
# fingerprint.sh — Triage an APK/XAPK before decompiling.
#
# Detects mobile framework (Flutter, React Native, Cordova/Capacitor,
# Xamarin, KMP/native), HTTP-stack hints, obfuscation level, native libs,
# and notable third-party SDKs.
#
# Decompiling Java is mostly useless for Flutter / RN / Xamarin / Cordova
# apps — different tools are needed. Run this BEFORE Phase 2 to choose
# the right path.
set -euo pipefail
usage() {
cat <<EOF
Usage: fingerprint.sh <file.apk|file.xapk>
Prints a one-screen summary:
* mobile framework (with rationale)
* HTTP / DI / serialization stack hints
* obfuscation indicator
* native libraries (consolidated across split APKs)
* notable third-party SDKs found in assets/
EOF
exit 0
}
[[ $# -lt 1 || "$1" == "-h" || "$1" == "--help" ]] && usage
INPUT="$1"
[[ ! -f "$INPUT" ]] && { echo "File not found: $INPUT" >&2; exit 1; }
TMP="$(mktemp -d -t apkfp.XXXXXX)"
trap 'rm -rf "$TMP"' EXIT
# Resolve to a list of APKs (handle XAPK = ZIP of APKs)
APKS=()
case "${INPUT,,}" in
*.xapk|*.apks|*.apkm)
unzip -q -o "$INPUT" -d "$TMP/xapk"
while IFS= read -r p; do APKS+=("$p"); done < <(find "$TMP/xapk" -maxdepth 2 -type f -name '*.apk')
;;
*.apk)
APKS=("$INPUT")
;;
*)
echo "Unsupported input: $INPUT" >&2; exit 1 ;;
esac
# Aggregate ZIP listings from every APK in the bundle (split-aware view)
LISTING="$TMP/listing.txt"
: > "$LISTING"
for apk in "${APKS[@]}"; do
unzip -l -- "$apk" 2>/dev/null | awk '{print $NF}' >> "$LISTING"
done
# Most class-level libs live inside classes*.dex, not as visible zip paths.
# Extract the type-name strings out of each dex with `strings` and append them
# to the listing so `has()` can match e.g. 'io/ktor/' or 'org/koin/'.
DEX_STRINGS="$TMP/dex_strings.txt"
: > "$DEX_STRINGS"
for apk in "${APKS[@]}"; do
for dex in $(unzip -Z1 -- "$apk" 2>/dev/null | grep -E '^classes[0-9]*\.dex$' || true); do
# DEX type descriptors look like "Lcom/foo/Bar;". Extract the inner
# slash-separated FQN so callers can match e.g. 'io/ktor/' directly.
unzip -p -- "$apk" "$dex" 2>/dev/null \
| strings -n 8 \
| grep -oE 'L[a-z][a-zA-Z0-9_]*(/[a-zA-Z0-9_$]+)+;' \
| sed -E 's/^L//; s/;$//' \
>> "$DEX_STRINGS" || true
done
done
sort -u "$DEX_STRINGS" -o "$DEX_STRINGS"
has() { grep -qE "$1" "$LISTING" || grep -qE "$1" "$DEX_STRINGS"; }
# ----------------------------------------------------------------------
# Framework detection (priority order — first match wins)
# ----------------------------------------------------------------------
FRAMEWORK="unknown"
RATIONALE=""
if has '^lib/[^/]+/libflutter\.so$'; then
FRAMEWORK="Flutter"
RATIONALE="lib/<abi>/libflutter.so present"
has '^lib/[^/]+/libapp\.so$' && RATIONALE+="; libapp.so contains AOT-compiled Dart"
elif has '^lib/[^/]+/libhermes\.so$' || has '^assets/index\.android\.bundle$' || has '^lib/[^/]+/libreactnativejni\.so$'; then
FRAMEWORK="React Native"
reasons=()
has '^lib/[^/]+/libhermes\.so$' && reasons+=("libhermes.so")
has '^lib/[^/]+/libreactnativejni\.so$' && reasons+=("libreactnativejni.so")
has '^assets/index\.android\.bundle$' && reasons+=("assets/index.android.bundle")
RATIONALE="${reasons[*]}"
elif has '^assets/www/index\.html$' || has '^assets/www/cordova\.js$' || has '^assets/public/index\.html$'; then
FRAMEWORK="Cordova / Capacitor (WebView hybrid)"
RATIONALE="assets/www/ or assets/public/ shell present"
elif has '^lib/[^/]+/libmonodroid\.so$' || has '^assemblies/'; then
FRAMEWORK="Xamarin / .NET MAUI"
RATIONALE="libmonodroid.so or assemblies/ present — code is in .NET DLLs"
elif has '^lib/[^/]+/libmaui\.so$'; then
FRAMEWORK=".NET MAUI"
RATIONALE="libmaui.so present"
elif has '^assets/flutter_assets/' && ! has '^lib/[^/]+/libflutter\.so$'; then
FRAMEWORK="Flutter (code-only split?)"
RATIONALE="flutter_assets/ but no libflutter.so in this APK — check splits"
else
# Native: distinguish Compose vs classic Android by androidx.compose presence
if has 'androidx\.compose'; then
FRAMEWORK="Native Android (Kotlin + Jetpack Compose)"
RATIONALE="androidx.compose.* libraries detected"
elif has '^META-INF/.*\.kotlin_module$'; then
FRAMEWORK="Native Android (Kotlin)"
RATIONALE="kotlin_module metadata present, no Compose markers"
else
FRAMEWORK="Native Android (Java/Kotlin)"
RATIONALE="no cross-platform framework markers found"
fi
fi
# ----------------------------------------------------------------------
# HTTP / DI / serialization stack hints
# ----------------------------------------------------------------------
http=()
has 'retrofit2' && http+=("Retrofit")
has 'okhttp3' && http+=("OkHttp")
has 'io/ktor/' && http+=("Ktor")
has 'com/apollographql/' && http+=("Apollo (GraphQL)")
has 'com/android/volley' && http+=("Volley")
di=()
has 'dagger/hilt/' && di+=("Hilt")
has '^META-INF/.*dagger.*' && di+=("Dagger")
has 'org/koin/' && di+=("Koin")
has 'javax/inject/' && [[ ${#di[@]} -eq 0 ]] && di+=("javax.inject")
ser=()
has 'kotlinx/serialization/' && ser+=("kotlinx.serialization")
has 'com/google/gson/' && ser+=("Gson")
has 'com/squareup/moshi/' && ser+=("Moshi")
has 'com/fasterxml/jackson/' && ser+=("Jackson")
# ----------------------------------------------------------------------
# Obfuscation indicator (R8/ProGuard) — count single-letter dex packages
# ----------------------------------------------------------------------
# Note: pipefail is on, so guard greps that may legitimately return 0 matches.
short_dirs=$( { grep -oE '^[a-z]{1,2}/' "$LISTING" || true; } | sort -u | wc -l | tr -d ' ')
if [[ "$short_dirs" -gt 30 ]]; then
OBFUSCATION="HIGH ($short_dirs single/double-letter dirs at root)"
elif [[ "$short_dirs" -gt 10 ]]; then
OBFUSCATION="MODERATE ($short_dirs short root dirs)"
else
OBFUSCATION="LOW (no significant short-name namespace pollution)"
fi
# ----------------------------------------------------------------------
# Native libraries (consolidated)
# ----------------------------------------------------------------------
NATIVE=$(grep -E '^lib/[^/]+/[^/]+\.so$' "$LISTING" | sort -u || true)
# ----------------------------------------------------------------------
# Notable third-party SDKs (assets-based markers)
# ----------------------------------------------------------------------
sdks=()
has '^assets/com/appsflyer/' && sdks+=("AppsFlyer")
has 'datadog\.buildId|com/datadog/' && sdks+=("Datadog")
has 'io/sentry/' && sdks+=("Sentry")
has 'com/google/firebase/' && sdks+=("Firebase")
has 'com/google/android/gms/' && sdks+=("Google Play Services")
has 'com/facebook/' && sdks+=("Facebook SDK")
has 'com/payu/' && sdks+=("PayU")
has 'com/stripe/' && sdks+=("Stripe")
has 'com/braintreepayments/' && sdks+=("Braintree")
has 'com/storyteller/' && sdks+=("Storyteller")
has 'zendesk/' && sdks+=("Zendesk")
has 'com/intercom/' && sdks+=("Intercom")
has 'com/segment/analytics' && sdks+=("Segment")
has 'com/amplitude/' && sdks+=("Amplitude")
has 'com/mixpanel/' && sdks+=("Mixpanel")
has 'com/onesignal/' && sdks+=("OneSignal")
has 'com/microsoft/clarity' && sdks+=("Microsoft Clarity")
has 'com/hotjar/' && sdks+=("Hotjar")
has 'com/instabug/' && sdks+=("Instabug")
# BuildConfig.java is almost never obfuscated and often holds base URLs / flavor.
if has 'BuildConfig\.class$'; then
BUILDCONFIG="present (grep BuildConfig.java after decompile for base URLs / flavor)"
else
BUILDCONFIG="not detected in zip listing (still worth grepping after decompile)"
fi
# ----------------------------------------------------------------------
# Summary
# ----------------------------------------------------------------------
echo "=== APK Fingerprint: $(basename "$INPUT") ==="
echo
echo "Framework: $FRAMEWORK"
echo " Rationale: $RATIONALE"
echo "Obfuscation: $OBFUSCATION"
echo
echo "HTTP stack: ${http[*]:-none detected}"
echo "DI: ${di[*]:-none detected}"
echo "Serialization: ${ser[*]:-none detected}"
echo "BuildConfig: $BUILDCONFIG"
echo
echo "Third-party SDKs: ${sdks[*]:-none detected}"
echo
echo "Native libraries (consolidated across splits):"
if [[ -n "$NATIVE" ]]; then
echo "$NATIVE" | sed 's/^/ /'
else
echo " (none)"
fi
echo
# ----------------------------------------------------------------------
# Recommendation
# ----------------------------------------------------------------------
echo "Recommended next step:"
case "$FRAMEWORK" in
Flutter*)
echo " Java decompilation will yield ~no app code. The Dart logic lives in"
echo " libapp.so (AOT). Use tools designed for Flutter:"
echo " - reFlutter / Doldrums / blutter (extract Dart class structure)"
echo " - strings/rabin2 on libapp.so for endpoints & string constants"
;;
React*)
echo " Java code is just the RN host. Real app logic is in JS/Hermes:"
echo " - if Hermes: hbctool disasm assets/index.android.bundle"
echo " - if JSC: js-beautify the bundle and grep for 'fetch('/'axios'"
;;
Cordova*)
echo " All app code is in assets/www/ (or assets/public/). Just unzip and"
echo " inspect the HTML/JS — no Java decompile needed."
;;
Xamarin*|.NET*)
echo " App logic is in .NET DLLs (assemblies/). Use ILSpy or dotPeek;"
echo " jadx will only show the Mono host."
;;
*)
echo " Proceed with Phase 2: bash scripts/decompile.sh <file>"
;;
esac

View File

@ -0,0 +1,85 @@
#!/usr/bin/env bash
# lookup-name.sh — Query the mapping produced by recover-kotlin-names.sh.
#
# Modes:
# lookup-name.sh <mapping-dir> <substring> search by real-FQN substring
# lookup-name.sh <mapping-dir> -o <obf> resolve obf -> real
# lookup-name.sh <mapping-dir> -p <pkg> list a real package
# lookup-name.sh <mapping-dir> --grep <regex> <sources-dir>
# grep decompiled sources and annotate each hit with the real class name
set -euo pipefail
usage() {
cat <<EOF
Usage: lookup-name.sh <mapping-dir> <query>
lookup-name.sh <mapping-dir> -o <obf-fqn>
lookup-name.sh <mapping-dir> -p <real-package-substring>
lookup-name.sh <mapping-dir> --grep <regex> <sources-dir>
<mapping-dir> is the directory produced by recover-kotlin-names.sh
(must contain mapping.json).
EOF
exit 0
}
[[ $# -lt 2 ]] && usage
DIR="$1"; shift
[[ ! -f "$DIR/mapping.json" ]] && { echo "no mapping.json in $DIR" >&2; exit 1; }
python3 - "$DIR" "$@" <<'PY'
import json, os, re, sys, subprocess
DIR = sys.argv[1]
args = sys.argv[2:]
MAP = json.load(open(os.path.join(DIR, "mapping.json")))
REV = {}
for o, r in MAP.items():
REV.setdefault(r, []).append(o)
def search(q):
ql = q.lower()
for r in sorted(REV):
if ql in r.lower():
print(r)
for o in sorted(REV[r]):
print(f" {o}")
def by_obf(o):
if o not in MAP:
print(f"no mapping for {o}", file=sys.stderr); sys.exit(1)
print(f"{o} -> {MAP[o]}")
sibs = [s for s in REV[MAP[o]] if s != o]
for s in sorted(sibs):
print(f" sibling: {s}")
def by_pkg(p):
pl = p.lower()
for r in sorted(REV):
if pl in r.rsplit(".", 1)[0].lower():
print(r)
for o in sorted(REV[r]):
print(f" {o}")
def grep_annot(pattern, sources):
res = subprocess.run(
["grep", "-rEn", "--include=*.java", pattern, sources],
capture_output=True, text=True)
for line in res.stdout.splitlines():
try:
path, lineno, content = line.split(":", 2)
except ValueError:
continue
rel = os.path.relpath(path, sources)
obf = rel.replace(os.sep, ".")[:-5]
suffix = f" // {MAP[obf]}" if obf in MAP else ""
print(f"{rel}:{lineno}:{content}{suffix}")
if args[0] == "-o" and len(args) == 2:
by_obf(args[1])
elif args[0] == "-p" and len(args) == 2:
by_pkg(args[1])
elif args[0] == "--grep" and len(args) == 3:
grep_annot(args[1], args[2])
else:
search(" ".join(args))
PY

View File

@ -0,0 +1,140 @@
#!/usr/bin/env bash
# recover-kotlin-names.sh — Rebuild a (obfuscated -> real) class-name map
# from Kotlin metadata strings left in decompiled sources.
#
# R8 obfuscates JVM symbols but cannot strip the Kotlin metadata strings —
# the Kotlin runtime (reflection, coroutines) needs them at runtime. Two
# annotations carry the original FQN:
#
# * @DebugMetadata(c = "<full.qualified.Name>", f = "<File.kt>", ...)
# emitted for almost every `suspend` function (every coroutine
# SuspendLambda).
#
# * @Metadata(... d2 = {"...L<pkg/Class>;..."} ...) listing internal
# class refs of the file.
#
# Typical recovery on a real-world app: 30-50 % of classes regain their real
# names — usually 100 % of the *Repository / *ViewModel / *UseCase / *Impl
# classes you actually want to read.
set -euo pipefail
usage() {
cat <<EOF
Usage: recover-kotlin-names.sh <decompiled-sources-dir> [output-dir]
Walks every *.java under <decompiled-sources-dir>, mines @DebugMetadata
and @Metadata annotations, and writes:
<output-dir>/mapping.tsv tab-separated obf_fqn <TAB> real_fqn <TAB> file
<output-dir>/mapping.json same data as JSON { obf_fqn: real_fqn, ... }
<output-dir>/by_package/ one file per real package, listing
real_fqn <TAB> obf_fqn <TAB> file
If [output-dir] is omitted, files are written next to the sources dir.
EOF
exit 0
}
[[ $# -lt 1 || "$1" == "-h" || "$1" == "--help" ]] && usage
SRC="$1"
OUT="${2:-$(dirname "$SRC")/mapping}"
[[ ! -d "$SRC" ]] && { echo "not a directory: $SRC" >&2; exit 1; }
mkdir -p "$OUT/by_package"
python3 - "$SRC" "$OUT" <<'PY'
import os, re, sys, json
from collections import defaultdict
SRC, OUT = sys.argv[1], sys.argv[2]
# @DebugMetadata(c = "com.foo.Bar$Inner$1", ...)
RE_DEBUG = re.compile(r'@DebugMetadata\([^)]*?c\s*=\s*"([^"]+)"', re.S)
# @Metadata(... d2 = { "...Lcom/foo/Bar;..." ...} )
RE_DTWO = re.compile(r'@Metadata\([^)]*?d2\s*=\s*\{([^}]*)\}', re.S)
RE_LCLASS = re.compile(r'L([A-Za-z][\w/$]+);')
# jadx sometimes emits this comment for renamed classes
RE_RENAMED = re.compile(r'/\*\s*renamed from:\s*([\w.$]+)\s*\*/')
# Skip third-party / framework trees — their names are already real.
SKIP_PREFIXES = (
"kotlin.", "kotlinx.", "androidx.", "android.", "java.", "javax.",
"com.google.", "com.facebook.", "com.appsflyer.", "com.datadog.",
"io.ktor.", "io.sentry.", "io.realm.", "okhttp3.", "okio.",
"com.squareup.", "com.bumptech.", "com.airbnb.", "com.payu.",
"com.storyteller.", "zendesk.", "io.intercom.", "com.microsoft.",
"com.tinder.", "com.hotjar.", "com.amplitude.", "com.segment.",
"com.mixpanel.", "com.onesignal.", "com.stripe.", "com.braintreepayments.",
"retrofit2.", "dagger.", "javax.inject.", "org.jetbrains.",
)
mapping = {}
file_real = {}
counts = defaultdict(int)
for dp, _, files in os.walk(SRC):
for f in files:
if not f.endswith(".java"):
continue
path = os.path.join(dp, f)
rel = os.path.relpath(path, SRC)
obf = rel[:-5].replace(os.sep, ".")
if obf.startswith(SKIP_PREFIXES):
continue
try:
text = open(path, "r", errors="replace").read()
except OSError:
continue
real = None
m = RE_DEBUG.search(text)
if m:
real = m.group(1).split("$", 1)[0]
counts["debug_meta"] += 1
if not real:
m = RE_DTWO.search(text)
if m:
for lm in RE_LCLASS.finditer(m.group(1)):
cand = lm.group(1).replace("/", ".").split("$", 1)[0]
if "." in cand and not cand.startswith(("kotlin.", "java.", "android")):
real = cand
counts["d2"] += 1
break
if not real:
m = RE_RENAMED.search(text)
if m:
real = m.group(1)
counts["renamed"] += 1
if real:
mapping[obf] = real
file_real[obf] = path
with open(os.path.join(OUT, "mapping.tsv"), "w") as f:
f.write("obf_fqn\treal_fqn\tfile\n")
for k in sorted(mapping):
f.write(f"{k}\t{mapping[k]}\t{file_real[k]}\n")
with open(os.path.join(OUT, "mapping.json"), "w") as f:
json.dump(mapping, f, indent=2, sort_keys=True)
by_pkg = defaultdict(list)
for obf, real in mapping.items():
pkg = real.rsplit(".", 1)[0] if "." in real else "(default)"
by_pkg[pkg].append((real, obf, file_real[obf]))
for pkg, rows in by_pkg.items():
safe = os.path.basename(pkg).replace(".", "_") or "default"
with open(os.path.join(OUT, "by_package", f"{safe}.txt"), "w") as f:
for real, obf, p in sorted(rows):
f.write(f"{real}\t{obf}\t{p}\n")
print(f"Recovered {len(mapping)} class names")
for k, v in counts.items():
print(f" via {k}: {v}")
print(f"Real packages: {len(by_pkg)}")
print(f"Wrote {OUT}/mapping.tsv, mapping.json, by_package/")
PY