feat: recover original Kotlin class names from R8-stripped binaries

R8 obfuscates JVM symbols but cannot strip the Kotlin metadata strings —
the Kotlin runtime needs them at runtime for reflection, coroutines, and
data-class features. The original FQNs leak through:

  * @DebugMetadata(c = "<real.fqn>")  emitted for every coroutine
    SuspendLambda (~ every suspend function in modern apps)
  * @Metadata(d2 = {"L<real/fqn>;"})  on every Kotlin class

Add scripts/recover-kotlin-names.sh that walks decompiled sources, mines
both annotations, and writes an obf -> real mapping (TSV + JSON + per-real-
package index). On a real-world Kotlin app this recovers ~100 % of
*Repository / *ViewModel / *UseCase / *Impl classes — exactly the classes
worth reading.

Add scripts/lookup-name.sh as a CLI over the mapping with four modes:
search by real-name substring, resolve obf -> real, list a real package,
and an annotated `--grep` that suffixes every hit with the owning real
class. This is a strict upgrade over plain grep against decompiled sources.

Replace the misleading 'use --deobf' tip in call-flow-analysis.md with a
pointer to this technique. --deobf only renames symbols with synthetic
placeholders; metadata recovery returns actual developer-written names.

Document the technique, expected recovery rates, and limitations in
references/kotlin-name-recovery.md, and reference it from SKILL.md as
optional Phase 3.5 (only when Phase 0 reports an obfuscated Kotlin app).
This commit is contained in:
Michał Tajchert 2026-04-29 01:12:31 +02:00
parent 213818fc27
commit 5b63fcb418
5 changed files with 362 additions and 1 deletions

View File

@ -154,6 +154,33 @@ Navigate the decompiled output to understand the app's architecture.
- Clean Architecture: look for `domain`, `data`, `presentation` packages
- This informs where to look for network calls in the next phases
### Phase 3.5: Recover Kotlin Class Names (only for obfuscated Kotlin apps)
If Phase 0 reported moderate / high obfuscation **and** the app is Kotlin
(Compose / kotlin_module markers detected), run the metadata recovery
script before tracing call flows. R8 obfuscates JVM symbols but cannot
strip Kotlin metadata strings, so original FQNs leak through
`@DebugMetadata` and `@Metadata.d2`.
```bash
bash ${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/scripts/recover-kotlin-names.sh \
<output>/sources <output>/mapping
```
Then use the lookup helper instead of plain grep — every hit comes
annotated with the owning class's real name:
```bash
bash ${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/scripts/lookup-name.sh \
<output>/mapping --grep '"/api/' <output>/sources
```
Typical recovery on a real-world Kotlin app: ~100% of `*Repository` /
`*ViewModel` / `*UseCase` / `*Impl` classes, ~80% of DTOs.
See `${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/references/kotlin-name-recovery.md`
for the full technique and limitations.
### Phase 4: Trace Call Flows
Follow execution paths from user-facing entry points down to network calls.

View File

@ -145,8 +145,9 @@ When code is obfuscated (ProGuard/R8):
1. **Start from strings**: Search for URLs, error messages, and known constants
2. **Start from framework classes**: Activities and Fragments are named in the manifest
3. **Follow library calls**: Retrofit `@GET`/`@POST` annotations are readable even when the interface class name is obfuscated
4. **Use `--deobf`**: jadx can generate readable replacement names
4. **Recover original Kotlin names from metadata**: `@DebugMetadata` and `@Metadata.d2` strings preserve the original FQNs even after R8 obfuscation. Run `scripts/recover-kotlin-names.sh` to build an `obf -> real` map (typically recovers 30-50% of classes — and almost 100% of `*Repository` / `*ViewModel` / `*Impl`). See [`kotlin-name-recovery.md`](./kotlin-name-recovery.md). This is the single highest-leverage step on any Kotlin app.
5. **Cross-reference**: If `class a` calls `Retrofit.create(b.class)`, then `b` is a Retrofit service interface
6. **`--deobf` is rarely enough on its own**: jadx's `--deobf` renames obfuscated symbols with synthetic placeholders (`p001a`, `C0123Foo`) — useful for disambiguation but it does **not** recover original names. Pair it with the metadata recovery above.
## 8. Tracing a Complete Call Flow: Example

View File

@ -0,0 +1,108 @@
# Recovering Original Class Names from Kotlin Metadata
When R8/ProGuard obfuscates a Kotlin app, JVM symbols are renamed but the
**Kotlin metadata strings cannot be stripped** — the Kotlin runtime depends
on them at runtime for reflection, coroutines, and `data class` features.
Two annotations leak the original fully-qualified names:
## `@DebugMetadata`
Generated for nearly every Kotlin coroutine `SuspendLambda` (i.e. almost
every `suspend` function in a modern app):
```java
@DebugMetadata(
c = "com.example.feature.account.AccountRepositoryImpl$fetch$1",
f = "AccountRepositoryImpl.kt",
l = {42, 51},
m = "invokeSuspend"
)
public final class a extends SuspendLambda implements Function2<...> { ... }
```
The `c =` field carries the original outer class FQN (with a `$` suffix
for inner / lambda scopes — strip everything after the first `$` to get the
declaring class).
## `@Metadata.d2`
Every Kotlin class carries a top-level `@Metadata` annotation. The `d2`
array lists internal class refs in JVM type-descriptor format
(`Lcom/example/Foo;`):
```java
@Metadata(d1 = {"..."},
d2 = {"...","Lcom/example/feature/account/AccountRepositoryImpl;","..."})
public final class b implements ... { ... }
```
The first non-stdlib descriptor in `d2` is usually the file's primary
class.
## How to mine them
The skill ships two scripts:
```bash
# Build a mapping from a decompiled sources directory:
bash scripts/recover-kotlin-names.sh <output>/sources [mapping-dir]
# Outputs:
# <mapping-dir>/mapping.tsv obf_fqn real_fqn file
# <mapping-dir>/mapping.json same data, JSON
# <mapping-dir>/by_package/ per-real-package index files
# Query the mapping:
bash scripts/lookup-name.sh <mapping-dir> Repository # search
bash scripts/lookup-name.sh <mapping-dir> -o ab.cd # obf -> real
bash scripts/lookup-name.sh <mapping-dir> -p com.example.feature # list package
bash scripts/lookup-name.sh <mapping-dir> --grep '"api/' <output>/sources
# ^ greps decompiled code and appends '// real.fqn' to each hit
```
## What you typically recover
On a real-world obfuscated Kotlin app the script recovers **30 50 % of
classes** — but more importantly, **almost 100 % of the classes you
actually want to read**:
| Class kind | Recovery rate |
|---------------------------|---------------|
| `*Repository` / `*Impl` | ~100 % |
| `*ViewModel` | ~100 % |
| `*UseCase` / `*Interactor`| ~100 % |
| Plain `data class` DTOs | ~80 % |
| Pure-Java helper classes | low (no Kotlin metadata) |
| Anonymous inner classes | sometimes recovered as the parent FQN |
## Why `jadx --deobf` is not enough
`--deobf` renames obfuscated identifiers using internal heuristics, but the
output is still synthetic (`p001a`, `C0123Foo`). It does **not** recover
the *original* names. Kotlin metadata recovery is the only reliable way to
map back to the names the developer actually wrote, and it costs essentially
nothing — just a regex pass over the decompiled sources.
Run both: `--deobf` for fields/methods that have no metadata source, plus
the recovery script for class names.
## Limitations
- **Method names and field names** are not recovered. Kotlin metadata only
preserves class-level FQNs and a few signatures. For method names you
still need jadx-gui's interactive rename or pattern inference.
- **Pure-Java classes** carry no `@Metadata`, so they remain obfuscated.
- **Heavily inlined classes** (`@JvmInline value class`, top-level fun
files compiled into shared `*Kt.class` synthetic classes) sometimes show
up under the wrong filename — treat results as a strong hint, not gospel.
## Reading flow with the mapping
1. Run `recover-kotlin-names.sh` once after decompiling.
2. Use `lookup-name.sh --grep '<pattern>' <sources>` instead of plain `grep`
so every hit comes annotated with the real owning class.
3. When you hit an obfuscated FQN in code (e.g. `nq.e`), resolve it with
`lookup-name.sh <mapping-dir> -o nq.e` — you will often see siblings
(`nq.d`, `nq.f`, ...) that are the same class's split lambdas/inner
classes, which is useful context.

View File

@ -0,0 +1,85 @@
#!/usr/bin/env bash
# lookup-name.sh — Query the mapping produced by recover-kotlin-names.sh.
#
# Modes:
# lookup-name.sh <mapping-dir> <substring> search by real-FQN substring
# lookup-name.sh <mapping-dir> -o <obf> resolve obf -> real
# lookup-name.sh <mapping-dir> -p <pkg> list a real package
# lookup-name.sh <mapping-dir> --grep <regex> <sources-dir>
# grep decompiled sources and annotate each hit with the real class name
set -euo pipefail
usage() {
cat <<EOF
Usage: lookup-name.sh <mapping-dir> <query>
lookup-name.sh <mapping-dir> -o <obf-fqn>
lookup-name.sh <mapping-dir> -p <real-package-substring>
lookup-name.sh <mapping-dir> --grep <regex> <sources-dir>
<mapping-dir> is the directory produced by recover-kotlin-names.sh
(must contain mapping.json).
EOF
exit 0
}
[[ $# -lt 2 ]] && usage
DIR="$1"; shift
[[ ! -f "$DIR/mapping.json" ]] && { echo "no mapping.json in $DIR" >&2; exit 1; }
python3 - "$DIR" "$@" <<'PY'
import json, os, re, sys, subprocess
DIR = sys.argv[1]
args = sys.argv[2:]
MAP = json.load(open(os.path.join(DIR, "mapping.json")))
REV = {}
for o, r in MAP.items():
REV.setdefault(r, []).append(o)
def search(q):
ql = q.lower()
for r in sorted(REV):
if ql in r.lower():
print(r)
for o in sorted(REV[r]):
print(f" {o}")
def by_obf(o):
if o not in MAP:
print(f"no mapping for {o}", file=sys.stderr); sys.exit(1)
print(f"{o} -> {MAP[o]}")
sibs = [s for s in REV[MAP[o]] if s != o]
for s in sorted(sibs):
print(f" sibling: {s}")
def by_pkg(p):
pl = p.lower()
for r in sorted(REV):
if pl in r.rsplit(".", 1)[0].lower():
print(r)
for o in sorted(REV[r]):
print(f" {o}")
def grep_annot(pattern, sources):
res = subprocess.run(
["grep", "-rEn", "--include=*.java", pattern, sources],
capture_output=True, text=True)
for line in res.stdout.splitlines():
try:
path, lineno, content = line.split(":", 2)
except ValueError:
continue
rel = os.path.relpath(path, sources)
obf = rel.replace(os.sep, ".")[:-5]
suffix = f" // {MAP[obf]}" if obf in MAP else ""
print(f"{rel}:{lineno}:{content}{suffix}")
if args[0] == "-o" and len(args) == 2:
by_obf(args[1])
elif args[0] == "-p" and len(args) == 2:
by_pkg(args[1])
elif args[0] == "--grep" and len(args) == 3:
grep_annot(args[1], args[2])
else:
search(" ".join(args))
PY

View File

@ -0,0 +1,140 @@
#!/usr/bin/env bash
# recover-kotlin-names.sh — Rebuild a (obfuscated -> real) class-name map
# from Kotlin metadata strings left in decompiled sources.
#
# R8 obfuscates JVM symbols but cannot strip the Kotlin metadata strings —
# the Kotlin runtime (reflection, coroutines) needs them at runtime. Two
# annotations carry the original FQN:
#
# * @DebugMetadata(c = "<full.qualified.Name>", f = "<File.kt>", ...)
# emitted for almost every `suspend` function (every coroutine
# SuspendLambda).
#
# * @Metadata(... d2 = {"...L<pkg/Class>;..."} ...) listing internal
# class refs of the file.
#
# Typical recovery on a real-world app: 30-50 % of classes regain their real
# names — usually 100 % of the *Repository / *ViewModel / *UseCase / *Impl
# classes you actually want to read.
set -euo pipefail
usage() {
cat <<EOF
Usage: recover-kotlin-names.sh <decompiled-sources-dir> [output-dir]
Walks every *.java under <decompiled-sources-dir>, mines @DebugMetadata
and @Metadata annotations, and writes:
<output-dir>/mapping.tsv tab-separated obf_fqn <TAB> real_fqn <TAB> file
<output-dir>/mapping.json same data as JSON { obf_fqn: real_fqn, ... }
<output-dir>/by_package/ one file per real package, listing
real_fqn <TAB> obf_fqn <TAB> file
If [output-dir] is omitted, files are written next to the sources dir.
EOF
exit 0
}
[[ $# -lt 1 || "$1" == "-h" || "$1" == "--help" ]] && usage
SRC="$1"
OUT="${2:-$(dirname "$SRC")/mapping}"
[[ ! -d "$SRC" ]] && { echo "not a directory: $SRC" >&2; exit 1; }
mkdir -p "$OUT/by_package"
python3 - "$SRC" "$OUT" <<'PY'
import os, re, sys, json
from collections import defaultdict
SRC, OUT = sys.argv[1], sys.argv[2]
# @DebugMetadata(c = "com.foo.Bar$Inner$1", ...)
RE_DEBUG = re.compile(r'@DebugMetadata\([^)]*?c\s*=\s*"([^"]+)"', re.S)
# @Metadata(... d2 = { "...Lcom/foo/Bar;..." ...} )
RE_DTWO = re.compile(r'@Metadata\([^)]*?d2\s*=\s*\{([^}]*)\}', re.S)
RE_LCLASS = re.compile(r'L([A-Za-z][\w/$]+);')
# jadx sometimes emits this comment for renamed classes
RE_RENAMED = re.compile(r'/\*\s*renamed from:\s*([\w.$]+)\s*\*/')
# Skip third-party / framework trees — their names are already real.
SKIP_PREFIXES = (
"kotlin.", "kotlinx.", "androidx.", "android.", "java.", "javax.",
"com.google.", "com.facebook.", "com.appsflyer.", "com.datadog.",
"io.ktor.", "io.sentry.", "io.realm.", "okhttp3.", "okio.",
"com.squareup.", "com.bumptech.", "com.airbnb.", "com.payu.",
"com.storyteller.", "zendesk.", "io.intercom.", "com.microsoft.",
"com.tinder.", "com.hotjar.", "com.amplitude.", "com.segment.",
"com.mixpanel.", "com.onesignal.", "com.stripe.", "com.braintreepayments.",
"retrofit2.", "dagger.", "javax.inject.", "org.jetbrains.",
)
mapping = {}
file_real = {}
counts = defaultdict(int)
for dp, _, files in os.walk(SRC):
for f in files:
if not f.endswith(".java"):
continue
path = os.path.join(dp, f)
rel = os.path.relpath(path, SRC)
obf = rel[:-5].replace(os.sep, ".")
if obf.startswith(SKIP_PREFIXES):
continue
try:
text = open(path, "r", errors="replace").read()
except OSError:
continue
real = None
m = RE_DEBUG.search(text)
if m:
real = m.group(1).split("$", 1)[0]
counts["debug_meta"] += 1
if not real:
m = RE_DTWO.search(text)
if m:
for lm in RE_LCLASS.finditer(m.group(1)):
cand = lm.group(1).replace("/", ".").split("$", 1)[0]
if "." in cand and not cand.startswith(("kotlin.", "java.", "android")):
real = cand
counts["d2"] += 1
break
if not real:
m = RE_RENAMED.search(text)
if m:
real = m.group(1)
counts["renamed"] += 1
if real:
mapping[obf] = real
file_real[obf] = path
with open(os.path.join(OUT, "mapping.tsv"), "w") as f:
f.write("obf_fqn\treal_fqn\tfile\n")
for k in sorted(mapping):
f.write(f"{k}\t{mapping[k]}\t{file_real[k]}\n")
with open(os.path.join(OUT, "mapping.json"), "w") as f:
json.dump(mapping, f, indent=2, sort_keys=True)
by_pkg = defaultdict(list)
for obf, real in mapping.items():
pkg = real.rsplit(".", 1)[0] if "." in real else "(default)"
by_pkg[pkg].append((real, obf, file_real[obf]))
for pkg, rows in by_pkg.items():
safe = pkg.replace(".", "_") or "default"
with open(os.path.join(OUT, "by_package", f"{safe}.txt"), "w") as f:
for real, obf, p in sorted(rows):
f.write(f"{real}\t{obf}\t{p}\n")
print(f"Recovered {len(mapping)} class names")
for k, v in counts.items():
print(f" via {k}: {v}")
print(f"Real packages: {len(by_pkg)}")
print(f"Wrote {OUT}/mapping.tsv, mapping.json, by_package/")
PY