feat: recover original Kotlin class names from R8-stripped binaries
R8 obfuscates JVM symbols but cannot strip the Kotlin metadata strings —
the Kotlin runtime needs them at runtime for reflection, coroutines, and
data-class features. The original FQNs leak through:
* @DebugMetadata(c = "<real.fqn>") emitted for every coroutine
SuspendLambda (~ every suspend function in modern apps)
* @Metadata(d2 = {"L<real/fqn>;"}) on every Kotlin class
Add scripts/recover-kotlin-names.sh that walks decompiled sources, mines
both annotations, and writes an obf -> real mapping (TSV + JSON + per-real-
package index). On a real-world Kotlin app this recovers ~100 % of
*Repository / *ViewModel / *UseCase / *Impl classes — exactly the classes
worth reading.
Add scripts/lookup-name.sh as a CLI over the mapping with four modes:
search by real-name substring, resolve obf -> real, list a real package,
and an annotated `--grep` that suffixes every hit with the owning real
class. This is a strict upgrade over plain grep against decompiled sources.
Replace the misleading 'use --deobf' tip in call-flow-analysis.md with a
pointer to this technique. --deobf only renames symbols with synthetic
placeholders; metadata recovery returns actual developer-written names.
Document the technique, expected recovery rates, and limitations in
references/kotlin-name-recovery.md, and reference it from SKILL.md as
optional Phase 3.5 (only when Phase 0 reports an obfuscated Kotlin app).
This commit is contained in:
parent
213818fc27
commit
5b63fcb418
|
|
@ -154,6 +154,33 @@ Navigate the decompiled output to understand the app's architecture.
|
||||||
- Clean Architecture: look for `domain`, `data`, `presentation` packages
|
- Clean Architecture: look for `domain`, `data`, `presentation` packages
|
||||||
- This informs where to look for network calls in the next phases
|
- This informs where to look for network calls in the next phases
|
||||||
|
|
||||||
|
### Phase 3.5: Recover Kotlin Class Names (only for obfuscated Kotlin apps)
|
||||||
|
|
||||||
|
If Phase 0 reported moderate / high obfuscation **and** the app is Kotlin
|
||||||
|
(Compose / kotlin_module markers detected), run the metadata recovery
|
||||||
|
script before tracing call flows. R8 obfuscates JVM symbols but cannot
|
||||||
|
strip Kotlin metadata strings, so original FQNs leak through
|
||||||
|
`@DebugMetadata` and `@Metadata.d2`.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash ${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/scripts/recover-kotlin-names.sh \
|
||||||
|
<output>/sources <output>/mapping
|
||||||
|
```
|
||||||
|
|
||||||
|
Then use the lookup helper instead of plain grep — every hit comes
|
||||||
|
annotated with the owning class's real name:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash ${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/scripts/lookup-name.sh \
|
||||||
|
<output>/mapping --grep '"/api/' <output>/sources
|
||||||
|
```
|
||||||
|
|
||||||
|
Typical recovery on a real-world Kotlin app: ~100% of `*Repository` /
|
||||||
|
`*ViewModel` / `*UseCase` / `*Impl` classes, ~80% of DTOs.
|
||||||
|
|
||||||
|
See `${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/references/kotlin-name-recovery.md`
|
||||||
|
for the full technique and limitations.
|
||||||
|
|
||||||
### Phase 4: Trace Call Flows
|
### Phase 4: Trace Call Flows
|
||||||
|
|
||||||
Follow execution paths from user-facing entry points down to network calls.
|
Follow execution paths from user-facing entry points down to network calls.
|
||||||
|
|
|
||||||
|
|
@ -145,8 +145,9 @@ When code is obfuscated (ProGuard/R8):
|
||||||
1. **Start from strings**: Search for URLs, error messages, and known constants
|
1. **Start from strings**: Search for URLs, error messages, and known constants
|
||||||
2. **Start from framework classes**: Activities and Fragments are named in the manifest
|
2. **Start from framework classes**: Activities and Fragments are named in the manifest
|
||||||
3. **Follow library calls**: Retrofit `@GET`/`@POST` annotations are readable even when the interface class name is obfuscated
|
3. **Follow library calls**: Retrofit `@GET`/`@POST` annotations are readable even when the interface class name is obfuscated
|
||||||
4. **Use `--deobf`**: jadx can generate readable replacement names
|
4. **Recover original Kotlin names from metadata**: `@DebugMetadata` and `@Metadata.d2` strings preserve the original FQNs even after R8 obfuscation. Run `scripts/recover-kotlin-names.sh` to build an `obf -> real` map (typically recovers 30-50% of classes — and almost 100% of `*Repository` / `*ViewModel` / `*Impl`). See [`kotlin-name-recovery.md`](./kotlin-name-recovery.md). This is the single highest-leverage step on any Kotlin app.
|
||||||
5. **Cross-reference**: If `class a` calls `Retrofit.create(b.class)`, then `b` is a Retrofit service interface
|
5. **Cross-reference**: If `class a` calls `Retrofit.create(b.class)`, then `b` is a Retrofit service interface
|
||||||
|
6. **`--deobf` is rarely enough on its own**: jadx's `--deobf` renames obfuscated symbols with synthetic placeholders (`p001a`, `C0123Foo`) — useful for disambiguation but it does **not** recover original names. Pair it with the metadata recovery above.
|
||||||
|
|
||||||
## 8. Tracing a Complete Call Flow: Example
|
## 8. Tracing a Complete Call Flow: Example
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,108 @@
|
||||||
|
# Recovering Original Class Names from Kotlin Metadata
|
||||||
|
|
||||||
|
When R8/ProGuard obfuscates a Kotlin app, JVM symbols are renamed but the
|
||||||
|
**Kotlin metadata strings cannot be stripped** — the Kotlin runtime depends
|
||||||
|
on them at runtime for reflection, coroutines, and `data class` features.
|
||||||
|
|
||||||
|
Two annotations leak the original fully-qualified names:
|
||||||
|
|
||||||
|
## `@DebugMetadata`
|
||||||
|
|
||||||
|
Generated for nearly every Kotlin coroutine `SuspendLambda` (i.e. almost
|
||||||
|
every `suspend` function in a modern app):
|
||||||
|
|
||||||
|
```java
|
||||||
|
@DebugMetadata(
|
||||||
|
c = "com.example.feature.account.AccountRepositoryImpl$fetch$1",
|
||||||
|
f = "AccountRepositoryImpl.kt",
|
||||||
|
l = {42, 51},
|
||||||
|
m = "invokeSuspend"
|
||||||
|
)
|
||||||
|
public final class a extends SuspendLambda implements Function2<...> { ... }
|
||||||
|
```
|
||||||
|
|
||||||
|
The `c =` field carries the original outer class FQN (with a `$` suffix
|
||||||
|
for inner / lambda scopes — strip everything after the first `$` to get the
|
||||||
|
declaring class).
|
||||||
|
|
||||||
|
## `@Metadata.d2`
|
||||||
|
|
||||||
|
Every Kotlin class carries a top-level `@Metadata` annotation. The `d2`
|
||||||
|
array lists internal class refs in JVM type-descriptor format
|
||||||
|
(`Lcom/example/Foo;`):
|
||||||
|
|
||||||
|
```java
|
||||||
|
@Metadata(d1 = {"..."},
|
||||||
|
d2 = {"...","Lcom/example/feature/account/AccountRepositoryImpl;","..."})
|
||||||
|
public final class b implements ... { ... }
|
||||||
|
```
|
||||||
|
|
||||||
|
The first non-stdlib descriptor in `d2` is usually the file's primary
|
||||||
|
class.
|
||||||
|
|
||||||
|
## How to mine them
|
||||||
|
|
||||||
|
The skill ships two scripts:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Build a mapping from a decompiled sources directory:
|
||||||
|
bash scripts/recover-kotlin-names.sh <output>/sources [mapping-dir]
|
||||||
|
|
||||||
|
# Outputs:
|
||||||
|
# <mapping-dir>/mapping.tsv obf_fqn real_fqn file
|
||||||
|
# <mapping-dir>/mapping.json same data, JSON
|
||||||
|
# <mapping-dir>/by_package/ per-real-package index files
|
||||||
|
|
||||||
|
# Query the mapping:
|
||||||
|
bash scripts/lookup-name.sh <mapping-dir> Repository # search
|
||||||
|
bash scripts/lookup-name.sh <mapping-dir> -o ab.cd # obf -> real
|
||||||
|
bash scripts/lookup-name.sh <mapping-dir> -p com.example.feature # list package
|
||||||
|
bash scripts/lookup-name.sh <mapping-dir> --grep '"api/' <output>/sources
|
||||||
|
# ^ greps decompiled code and appends '// real.fqn' to each hit
|
||||||
|
```
|
||||||
|
|
||||||
|
## What you typically recover
|
||||||
|
|
||||||
|
On a real-world obfuscated Kotlin app the script recovers **30 – 50 % of
|
||||||
|
classes** — but more importantly, **almost 100 % of the classes you
|
||||||
|
actually want to read**:
|
||||||
|
|
||||||
|
| Class kind | Recovery rate |
|
||||||
|
|---------------------------|---------------|
|
||||||
|
| `*Repository` / `*Impl` | ~100 % |
|
||||||
|
| `*ViewModel` | ~100 % |
|
||||||
|
| `*UseCase` / `*Interactor`| ~100 % |
|
||||||
|
| Plain `data class` DTOs | ~80 % |
|
||||||
|
| Pure-Java helper classes | low (no Kotlin metadata) |
|
||||||
|
| Anonymous inner classes | sometimes recovered as the parent FQN |
|
||||||
|
|
||||||
|
## Why `jadx --deobf` is not enough
|
||||||
|
|
||||||
|
`--deobf` renames obfuscated identifiers using internal heuristics, but the
|
||||||
|
output is still synthetic (`p001a`, `C0123Foo`). It does **not** recover
|
||||||
|
the *original* names. Kotlin metadata recovery is the only reliable way to
|
||||||
|
map back to the names the developer actually wrote, and it costs essentially
|
||||||
|
nothing — just a regex pass over the decompiled sources.
|
||||||
|
|
||||||
|
Run both: `--deobf` for fields/methods that have no metadata source, plus
|
||||||
|
the recovery script for class names.
|
||||||
|
|
||||||
|
## Limitations
|
||||||
|
|
||||||
|
- **Method names and field names** are not recovered. Kotlin metadata only
|
||||||
|
preserves class-level FQNs and a few signatures. For method names you
|
||||||
|
still need jadx-gui's interactive rename or pattern inference.
|
||||||
|
- **Pure-Java classes** carry no `@Metadata`, so they remain obfuscated.
|
||||||
|
- **Heavily inlined classes** (`@JvmInline value class`, top-level fun
|
||||||
|
files compiled into shared `*Kt.class` synthetic classes) sometimes show
|
||||||
|
up under the wrong filename — treat results as a strong hint, not gospel.
|
||||||
|
|
||||||
|
## Reading flow with the mapping
|
||||||
|
|
||||||
|
1. Run `recover-kotlin-names.sh` once after decompiling.
|
||||||
|
2. Use `lookup-name.sh --grep '<pattern>' <sources>` instead of plain `grep`
|
||||||
|
so every hit comes annotated with the real owning class.
|
||||||
|
3. When you hit an obfuscated FQN in code (e.g. `nq.e`), resolve it with
|
||||||
|
`lookup-name.sh <mapping-dir> -o nq.e` — you will often see siblings
|
||||||
|
(`nq.d`, `nq.f`, ...) that are the same class's split lambdas/inner
|
||||||
|
classes, which is useful context.
|
||||||
|
|
@ -0,0 +1,85 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
# lookup-name.sh — Query the mapping produced by recover-kotlin-names.sh.
|
||||||
|
#
|
||||||
|
# Modes:
|
||||||
|
# lookup-name.sh <mapping-dir> <substring> search by real-FQN substring
|
||||||
|
# lookup-name.sh <mapping-dir> -o <obf> resolve obf -> real
|
||||||
|
# lookup-name.sh <mapping-dir> -p <pkg> list a real package
|
||||||
|
# lookup-name.sh <mapping-dir> --grep <regex> <sources-dir>
|
||||||
|
# grep decompiled sources and annotate each hit with the real class name
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
usage() {
|
||||||
|
cat <<EOF
|
||||||
|
Usage: lookup-name.sh <mapping-dir> <query>
|
||||||
|
lookup-name.sh <mapping-dir> -o <obf-fqn>
|
||||||
|
lookup-name.sh <mapping-dir> -p <real-package-substring>
|
||||||
|
lookup-name.sh <mapping-dir> --grep <regex> <sources-dir>
|
||||||
|
|
||||||
|
<mapping-dir> is the directory produced by recover-kotlin-names.sh
|
||||||
|
(must contain mapping.json).
|
||||||
|
EOF
|
||||||
|
exit 0
|
||||||
|
}
|
||||||
|
|
||||||
|
[[ $# -lt 2 ]] && usage
|
||||||
|
DIR="$1"; shift
|
||||||
|
[[ ! -f "$DIR/mapping.json" ]] && { echo "no mapping.json in $DIR" >&2; exit 1; }
|
||||||
|
|
||||||
|
python3 - "$DIR" "$@" <<'PY'
|
||||||
|
import json, os, re, sys, subprocess
|
||||||
|
DIR = sys.argv[1]
|
||||||
|
args = sys.argv[2:]
|
||||||
|
MAP = json.load(open(os.path.join(DIR, "mapping.json")))
|
||||||
|
REV = {}
|
||||||
|
for o, r in MAP.items():
|
||||||
|
REV.setdefault(r, []).append(o)
|
||||||
|
|
||||||
|
def search(q):
|
||||||
|
ql = q.lower()
|
||||||
|
for r in sorted(REV):
|
||||||
|
if ql in r.lower():
|
||||||
|
print(r)
|
||||||
|
for o in sorted(REV[r]):
|
||||||
|
print(f" {o}")
|
||||||
|
|
||||||
|
def by_obf(o):
|
||||||
|
if o not in MAP:
|
||||||
|
print(f"no mapping for {o}", file=sys.stderr); sys.exit(1)
|
||||||
|
print(f"{o} -> {MAP[o]}")
|
||||||
|
sibs = [s for s in REV[MAP[o]] if s != o]
|
||||||
|
for s in sorted(sibs):
|
||||||
|
print(f" sibling: {s}")
|
||||||
|
|
||||||
|
def by_pkg(p):
|
||||||
|
pl = p.lower()
|
||||||
|
for r in sorted(REV):
|
||||||
|
if pl in r.rsplit(".", 1)[0].lower():
|
||||||
|
print(r)
|
||||||
|
for o in sorted(REV[r]):
|
||||||
|
print(f" {o}")
|
||||||
|
|
||||||
|
def grep_annot(pattern, sources):
|
||||||
|
res = subprocess.run(
|
||||||
|
["grep", "-rEn", "--include=*.java", pattern, sources],
|
||||||
|
capture_output=True, text=True)
|
||||||
|
for line in res.stdout.splitlines():
|
||||||
|
try:
|
||||||
|
path, lineno, content = line.split(":", 2)
|
||||||
|
except ValueError:
|
||||||
|
continue
|
||||||
|
rel = os.path.relpath(path, sources)
|
||||||
|
obf = rel.replace(os.sep, ".")[:-5]
|
||||||
|
suffix = f" // {MAP[obf]}" if obf in MAP else ""
|
||||||
|
print(f"{rel}:{lineno}:{content}{suffix}")
|
||||||
|
|
||||||
|
if args[0] == "-o" and len(args) == 2:
|
||||||
|
by_obf(args[1])
|
||||||
|
elif args[0] == "-p" and len(args) == 2:
|
||||||
|
by_pkg(args[1])
|
||||||
|
elif args[0] == "--grep" and len(args) == 3:
|
||||||
|
grep_annot(args[1], args[2])
|
||||||
|
else:
|
||||||
|
search(" ".join(args))
|
||||||
|
PY
|
||||||
|
|
@ -0,0 +1,140 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
# recover-kotlin-names.sh — Rebuild a (obfuscated -> real) class-name map
|
||||||
|
# from Kotlin metadata strings left in decompiled sources.
|
||||||
|
#
|
||||||
|
# R8 obfuscates JVM symbols but cannot strip the Kotlin metadata strings —
|
||||||
|
# the Kotlin runtime (reflection, coroutines) needs them at runtime. Two
|
||||||
|
# annotations carry the original FQN:
|
||||||
|
#
|
||||||
|
# * @DebugMetadata(c = "<full.qualified.Name>", f = "<File.kt>", ...)
|
||||||
|
# emitted for almost every `suspend` function (every coroutine
|
||||||
|
# SuspendLambda).
|
||||||
|
#
|
||||||
|
# * @Metadata(... d2 = {"...L<pkg/Class>;..."} ...) listing internal
|
||||||
|
# class refs of the file.
|
||||||
|
#
|
||||||
|
# Typical recovery on a real-world app: 30-50 % of classes regain their real
|
||||||
|
# names — usually 100 % of the *Repository / *ViewModel / *UseCase / *Impl
|
||||||
|
# classes you actually want to read.
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
usage() {
|
||||||
|
cat <<EOF
|
||||||
|
Usage: recover-kotlin-names.sh <decompiled-sources-dir> [output-dir]
|
||||||
|
|
||||||
|
Walks every *.java under <decompiled-sources-dir>, mines @DebugMetadata
|
||||||
|
and @Metadata annotations, and writes:
|
||||||
|
|
||||||
|
<output-dir>/mapping.tsv tab-separated obf_fqn <TAB> real_fqn <TAB> file
|
||||||
|
<output-dir>/mapping.json same data as JSON { obf_fqn: real_fqn, ... }
|
||||||
|
<output-dir>/by_package/ one file per real package, listing
|
||||||
|
real_fqn <TAB> obf_fqn <TAB> file
|
||||||
|
|
||||||
|
If [output-dir] is omitted, files are written next to the sources dir.
|
||||||
|
EOF
|
||||||
|
exit 0
|
||||||
|
}
|
||||||
|
|
||||||
|
[[ $# -lt 1 || "$1" == "-h" || "$1" == "--help" ]] && usage
|
||||||
|
SRC="$1"
|
||||||
|
OUT="${2:-$(dirname "$SRC")/mapping}"
|
||||||
|
[[ ! -d "$SRC" ]] && { echo "not a directory: $SRC" >&2; exit 1; }
|
||||||
|
|
||||||
|
mkdir -p "$OUT/by_package"
|
||||||
|
|
||||||
|
python3 - "$SRC" "$OUT" <<'PY'
|
||||||
|
import os, re, sys, json
|
||||||
|
from collections import defaultdict
|
||||||
|
|
||||||
|
SRC, OUT = sys.argv[1], sys.argv[2]
|
||||||
|
|
||||||
|
# @DebugMetadata(c = "com.foo.Bar$Inner$1", ...)
|
||||||
|
RE_DEBUG = re.compile(r'@DebugMetadata\([^)]*?c\s*=\s*"([^"]+)"', re.S)
|
||||||
|
# @Metadata(... d2 = { "...Lcom/foo/Bar;..." ...} )
|
||||||
|
RE_DTWO = re.compile(r'@Metadata\([^)]*?d2\s*=\s*\{([^}]*)\}', re.S)
|
||||||
|
RE_LCLASS = re.compile(r'L([A-Za-z][\w/$]+);')
|
||||||
|
# jadx sometimes emits this comment for renamed classes
|
||||||
|
RE_RENAMED = re.compile(r'/\*\s*renamed from:\s*([\w.$]+)\s*\*/')
|
||||||
|
|
||||||
|
# Skip third-party / framework trees — their names are already real.
|
||||||
|
SKIP_PREFIXES = (
|
||||||
|
"kotlin.", "kotlinx.", "androidx.", "android.", "java.", "javax.",
|
||||||
|
"com.google.", "com.facebook.", "com.appsflyer.", "com.datadog.",
|
||||||
|
"io.ktor.", "io.sentry.", "io.realm.", "okhttp3.", "okio.",
|
||||||
|
"com.squareup.", "com.bumptech.", "com.airbnb.", "com.payu.",
|
||||||
|
"com.storyteller.", "zendesk.", "io.intercom.", "com.microsoft.",
|
||||||
|
"com.tinder.", "com.hotjar.", "com.amplitude.", "com.segment.",
|
||||||
|
"com.mixpanel.", "com.onesignal.", "com.stripe.", "com.braintreepayments.",
|
||||||
|
"retrofit2.", "dagger.", "javax.inject.", "org.jetbrains.",
|
||||||
|
)
|
||||||
|
|
||||||
|
mapping = {}
|
||||||
|
file_real = {}
|
||||||
|
counts = defaultdict(int)
|
||||||
|
|
||||||
|
for dp, _, files in os.walk(SRC):
|
||||||
|
for f in files:
|
||||||
|
if not f.endswith(".java"):
|
||||||
|
continue
|
||||||
|
path = os.path.join(dp, f)
|
||||||
|
rel = os.path.relpath(path, SRC)
|
||||||
|
obf = rel[:-5].replace(os.sep, ".")
|
||||||
|
if obf.startswith(SKIP_PREFIXES):
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
text = open(path, "r", errors="replace").read()
|
||||||
|
except OSError:
|
||||||
|
continue
|
||||||
|
real = None
|
||||||
|
|
||||||
|
m = RE_DEBUG.search(text)
|
||||||
|
if m:
|
||||||
|
real = m.group(1).split("$", 1)[0]
|
||||||
|
counts["debug_meta"] += 1
|
||||||
|
|
||||||
|
if not real:
|
||||||
|
m = RE_DTWO.search(text)
|
||||||
|
if m:
|
||||||
|
for lm in RE_LCLASS.finditer(m.group(1)):
|
||||||
|
cand = lm.group(1).replace("/", ".").split("$", 1)[0]
|
||||||
|
if "." in cand and not cand.startswith(("kotlin.", "java.", "android")):
|
||||||
|
real = cand
|
||||||
|
counts["d2"] += 1
|
||||||
|
break
|
||||||
|
|
||||||
|
if not real:
|
||||||
|
m = RE_RENAMED.search(text)
|
||||||
|
if m:
|
||||||
|
real = m.group(1)
|
||||||
|
counts["renamed"] += 1
|
||||||
|
|
||||||
|
if real:
|
||||||
|
mapping[obf] = real
|
||||||
|
file_real[obf] = path
|
||||||
|
|
||||||
|
with open(os.path.join(OUT, "mapping.tsv"), "w") as f:
|
||||||
|
f.write("obf_fqn\treal_fqn\tfile\n")
|
||||||
|
for k in sorted(mapping):
|
||||||
|
f.write(f"{k}\t{mapping[k]}\t{file_real[k]}\n")
|
||||||
|
|
||||||
|
with open(os.path.join(OUT, "mapping.json"), "w") as f:
|
||||||
|
json.dump(mapping, f, indent=2, sort_keys=True)
|
||||||
|
|
||||||
|
by_pkg = defaultdict(list)
|
||||||
|
for obf, real in mapping.items():
|
||||||
|
pkg = real.rsplit(".", 1)[0] if "." in real else "(default)"
|
||||||
|
by_pkg[pkg].append((real, obf, file_real[obf]))
|
||||||
|
|
||||||
|
for pkg, rows in by_pkg.items():
|
||||||
|
safe = pkg.replace(".", "_") or "default"
|
||||||
|
with open(os.path.join(OUT, "by_package", f"{safe}.txt"), "w") as f:
|
||||||
|
for real, obf, p in sorted(rows):
|
||||||
|
f.write(f"{real}\t{obf}\t{p}\n")
|
||||||
|
|
||||||
|
print(f"Recovered {len(mapping)} class names")
|
||||||
|
for k, v in counts.items():
|
||||||
|
print(f" via {k}: {v}")
|
||||||
|
print(f"Real packages: {len(by_pkg)}")
|
||||||
|
print(f"Wrote {OUT}/mapping.tsv, mapping.json, by_package/")
|
||||||
|
PY
|
||||||
Loading…
Reference in New Issue