This commit is contained in:
Michal Tajchert 2026-04-29 13:48:58 +02:00 committed by GitHub
commit 615c33aab8
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
9 changed files with 1093 additions and 19 deletions

View File

@ -24,6 +24,31 @@ If anything is missing, follow the installation instructions in `${CLAUDE_PLUGIN
## Workflow
### Phase 0: Fingerprint the App (recommended before anything else)
Before installing tools or decompiling, run a fast triage to determine what
kind of app you are looking at. **Decompiling Java is mostly useless for
Flutter, React Native, Cordova/Capacitor, and Xamarin apps** — the real code
lives elsewhere. The fingerprint script tells you which.
```bash
bash ${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/scripts/fingerprint.sh <file.apk|file.xapk>
```
It prints, in one screen:
- **Mobile framework** (Flutter / React Native / Cordova / Xamarin / Native Kotlin / etc.) with the file marker that triggered the verdict.
- **HTTP stack** (Retrofit, OkHttp, Ktor, Apollo, Volley) detected via DEX string scan — works even when class names are obfuscated.
- **DI / serialization** signals (Hilt, Dagger, Koin, kotlinx.serialization, Moshi, Gson, Jackson).
- **Obfuscation level** estimate based on root-level short-named packages.
- **Notable third-party SDKs** (AppsFlyer, Datadog, Sentry, Firebase, payment SDKs, support/chat SDKs, etc.).
- **Consolidated native libraries** across the base APK and all splits — XAPK split bundles often place `.so` files in `config.<abi>.apk`, not in `base.apk`.
- **Recommended next step**, which differs by framework (e.g. for Flutter the script suggests `blutter` / `strings libapp.so` rather than jadx).
If the fingerprint says the app is Flutter / RN / Cordova / Xamarin, **stop**
and switch to the framework-appropriate tooling. Phases 15 below assume a
native (Java/Kotlin) Android app.
### Phase 1: Verify and Install Dependencies
Before decompiling, confirm that the required tools are available — and install any that are missing.
@ -123,12 +148,45 @@ Navigate the decompiled output to understand the app's architecture.
- Distinguish app code from third-party libraries
- Look for packages named `api`, `network`, `data`, `repository`, `service`, `retrofit`, `http` — these are where API calls live
3. **Identify the architecture pattern**:
3. **Read every `BuildConfig.java`** — these are almost never obfuscated and frequently leak the highest-signal constants in the entire APK (base URLs, flavor names, build type, third-party API keys, feature flags):
```bash
find <output>/sources -name BuildConfig.java -exec grep -H '=' {} \;
```
Each Gradle module emits its own `BuildConfig`, so expect 1N hits. Read all of them.
4. **Identify the architecture pattern**:
- MVP: look for `Presenter` classes
- MVVM: look for `ViewModel` classes and `LiveData`/`StateFlow`
- Clean Architecture: look for `domain`, `data`, `presentation` packages
- This informs where to look for network calls in the next phases
### Phase 3.5: Recover Kotlin Class Names (only for obfuscated Kotlin apps)
If Phase 0 reported moderate / high obfuscation **and** the app is Kotlin
(Compose / kotlin_module markers detected), run the metadata recovery
script before tracing call flows. R8 obfuscates JVM symbols but cannot
strip Kotlin metadata strings, so original FQNs leak through
`@DebugMetadata` and `@Metadata.d2`.
```bash
bash ${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/scripts/recover-kotlin-names.sh \
<output>/sources <output>/mapping
```
Then use the lookup helper instead of plain grep — every hit comes
annotated with the owning class's real name:
```bash
bash ${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/scripts/lookup-name.sh \
<output>/mapping --grep '"/api/' <output>/sources
```
Typical recovery on a real-world Kotlin app: ~100% of `*Repository` /
`*ViewModel` / `*UseCase` / `*Impl` classes, ~80% of DTOs.
See `${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/references/kotlin-name-recovery.md`
for the full technique and limitations.
### Phase 4: Trace Call Flows
Follow execution paths from user-facing entry points down to network calls.
@ -190,15 +248,32 @@ On Windows (PowerShell):
& "${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/scripts/find-api-calls.ps1" <output>/sources/ -Auth
```
Then, for each discovered endpoint, read the surrounding source code to extract:
- HTTP method and path
- Base URL
- Path parameters, query parameters, request body
- Headers (especially authentication)
- Response type
- Where it's called from (the call chain from Phase 4)
Document the endpoints in **two tiers** — going deep on every endpoint is
prohibitively expensive on apps with 100+ paths, and most of them do not
warrant it. Always produce Tier 1; expand Tier 2 only for the endpoints
that matter.
**Document each endpoint** using this format:
#### Tier 1 — flat inventory (always)
A single table covering every discovered endpoint. Aim for one line each;
if you cannot determine a column, write `?`.
| Host | Method | Path | Auth | Source file |
|------|--------|------|------|-------------|
| `api.example.com` | GET | `/v1/users/profile` | Bearer | `com/example/api/UserApi.java` |
| `api.example.com` | POST | `/v1/auth/login` | none | `com/example/api/AuthApi.java` |
This table answers "what does the backend look like" in one screen and
takes ~5 minutes to produce from the `--paths` output even on a large app.
#### Tier 2 — per-endpoint detail (only for high-value endpoints)
Reserve the detailed format for the few endpoints that actually need it:
- the entire authentication flow (login, refresh, logout, OTP/SMS, anonymous, registration)
- payment / checkout / order-creation endpoints
- anything the user explicitly asked about
- anything that looked unusual during the scan (custom signing, undocumented headers, etc.)
```markdown
### `METHOD /path`
@ -213,6 +288,10 @@ Then, for each discovered endpoint, read the surrounding source code to extract:
- **Called from**: `LoginActivity → LoginViewModel → UserRepository → ApiService`
```
As a default, do not produce Tier 2 entries for more than ~10 endpoints
unless the user explicitly asks for more — Tier 1 plus a Tier 2 deep dive
on auth + 1-2 key flows is what most consumers of this work actually want.
See `${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/references/api-extraction-patterns.md` for library-specific search patterns and the full documentation template.
## Output

View File

@ -55,6 +55,65 @@ grep -rn 'Interceptor\|addInterceptor\|addNetworkInterceptor\|intercept(' source
grep -rn '\.execute()\|\.enqueue(' sources/
```
## Ktor (Kotlin)
Ktor is the dominant HTTP client in Kotlin Multiplatform and modern
Kotlin-only Android apps. Unlike Retrofit, Ktor does **not** use annotations
to declare endpoints — paths appear as plain string arguments to
`client.get(...)` / `client.post(...)`, often inside an extension function.
```bash
# Calls
grep -rn '\b\(client\|httpClient\|HttpClient\)\.\(get\|post\|put\|delete\|patch\|head\|request\)\s*[<(]' sources/
# Default request / base URL configuration
grep -rn 'HttpRequestBuilder\|defaultRequest\s*{\|\burl\s*(\s*"\|URLBuilder' sources/
# Auth plugin (bearer / refresh)
grep -rn '\bbearer\s*{\|BearerTokens\s*(\|loadTokens\s*{\|refreshTokens\s*{' sources/
```
Typical Ktor call (after decompile):
```java
client.get("api/v1/users/profile") {
parameter("locale", "en-US");
}
```
The base URL is usually applied via `defaultRequest { url { host = "..." } }`
in the client builder. Search for `host =` and `URLProtocol.HTTPS` references
to pin it down.
**Note on obfuscation:** in heavily R8-shrunk apps the call site
`client.get("path")` is inlined to something like `aVar.a(dVar, "path")`
and the `client.<verb>(` regex misses it. The path string itself is **not**
obfuscated, however — fall back to the generic path-literal search
(`--paths`) for the endpoint inventory in those cases. Ktor library
internals (`BearerTokens`, `loadTokens`, `refreshTokens`, `URLProtocol`)
remain searchable because Ktor keeps these on its public API.
Ktor's authentication plugin uses the
[`Auth { bearer { loadTokens { ... }; refreshTokens { ... } } }`](https://ktor.io/docs/auth.html)
DSL — bearer access tokens with automatic refresh. After R8, the DSL
lambdas appear as `Function2`/`Function3` impls referencing
`BearerTokens(...)` calls.
## Apollo Kotlin (GraphQL)
```bash
# Client setup
grep -rn 'ApolloClient\|\.serverUrl(\|HttpNetworkTransport' sources/
# Operations (queries / mutations / subscriptions)
grep -rn '\.query(\s*[A-Z]\|\.mutation(\s*[A-Z]\|\.subscription(\s*[A-Z]' sources/
```
Apollo generates one class per operation under a generated package; once you
find the GraphQL endpoint URL via `ApolloClient.serverUrl("...")`, use the
operation classes themselves as the API documentation — each carries its
GraphQL document text in `OPERATION_DOCUMENT`.
## Volley
```bash
@ -77,6 +136,25 @@ grep -rn 'loadUrl\|evaluateJavascript\|addJavascriptInterface\|WebViewClient\|sh
WebView-based apps may load API endpoints via JavaScript bridges. Look for `@JavascriptInterface` annotated methods.
## Endpoint-Shaped Path Literals (obfuscation-resistant)
When the HTTP client cannot be identified (custom abstraction, heavy
inlining, KMP shared module), or the call sites are obfuscated to
`a.b(c, "path")`, fall back to extracting the path string literals
themselves. R8 does not obfuscate string contents, so paths leak through.
```bash
# All quoted strings shaped like an API path, deduplicated
grep -rhoE '"(/[A-Za-z0-9_{}.\-]+(/[A-Za-z0-9_{}.\-]+)+/?|(api|v[0-9]+|graphql|users?|account|auth|sso|oauth|profile|cart|basket|order|product|inventory|search|category|address|location|delivery|payment|invoice|favo[u]?rites?)(/[A-Za-z0-9_{}.\-]+)+/?)"' sources/ \
| grep -Ev '^"(image|video|audio|text|application|content)/|^"/(proc|sys|dev|tmp|etc)/' \
| sort -u
```
The skill ships this as `find-api-calls.sh --paths`, which prints both a
deduplicated inventory and the full list of call sites. On real-world
Kotlin apps this single command typically produces 100300 distinct
endpoint paths, which is the most useful first artifact for documentation.
## Hardcoded URLs and Secrets
```bash

View File

@ -84,9 +84,9 @@ Look for:
- Firebase/analytics initialization
- Base URL configuration
## 5. Dependency Injection (Dagger / Hilt)
## 5. Dependency Injection
Modern Android apps use DI. Trace bindings to find implementations:
### Dagger / Hilt
```bash
# Hilt modules
@ -102,10 +102,43 @@ grep -rn '@Component\|@Subcomponent' sources/
grep -rn '@Inject' sources/
```
To trace a call flow through DI:
1. Find where an interface is used (e.g., `ApiService` injected into a repository)
2. Find the `@Provides` or `@Binds` method that creates the implementation
3. Follow the implementation to the actual HTTP call
### Koin
Koin is the dominant DI framework in Kotlin Multiplatform and a large
share of Kotlin-only Android apps. It uses a runtime DSL rather than
compile-time generated factories, so the search patterns are different:
```bash
# Confirm Koin is actually wired up
grep -rn 'org\.koin\.' sources/
# DI module declarations
grep -rn 'fun [A-Za-z]\+Module\|module\s*{\|module(' sources/
# Bindings inside a module DSL
grep -rn 'single\s*[<{(]\|factory\s*[<{(]\|viewModel\s*[<{(]\|scoped\s*[<{(]\|singleOf\|factoryOf' sources/
# Resolution call-sites (where a binding is consumed)
grep -rn '\bget\s*<\|\binject\s*<\|by\s\+inject\b\|by\s\+viewModel\b\|getKoin' sources/
```
After R8, every binding lambda becomes an anonymous
`Function2<Scope, ParametersHolder, T>` impl. To find the binding for an
interface `Foo`, look for files that contain both a Koin import / module
DSL marker and a reference to `Foo`:
```bash
grep -rln 'org\.koin\.core\.module' sources/ | xargs grep -l 'Foo'
```
### Trace through DI
1. Find where an interface is used (e.g. `ApiService` injected into a
repository).
2. Find the `@Provides` / `@Binds` method (Hilt) **or** the
`single { ... }` / `factory { ... }` block (Koin) that creates the
implementation.
3. Follow the implementation to the actual HTTP call.
## 6. Find Constants and Configuration
@ -145,8 +178,9 @@ When code is obfuscated (ProGuard/R8):
1. **Start from strings**: Search for URLs, error messages, and known constants
2. **Start from framework classes**: Activities and Fragments are named in the manifest
3. **Follow library calls**: Retrofit `@GET`/`@POST` annotations are readable even when the interface class name is obfuscated
4. **Use `--deobf`**: jadx can generate readable replacement names
4. **Recover original Kotlin names from metadata**: `@DebugMetadata` and `@Metadata.d2` strings preserve the original FQNs even after R8 obfuscation. Run `scripts/recover-kotlin-names.sh` to build an `obf -> real` map (typically recovers 30-50% of classes — and almost 100% of `*Repository` / `*ViewModel` / `*Impl`). See [`kotlin-name-recovery.md`](./kotlin-name-recovery.md). This is the single highest-leverage step on any Kotlin app.
5. **Cross-reference**: If `class a` calls `Retrofit.create(b.class)`, then `b` is a Retrofit service interface
6. **`--deobf` is rarely enough on its own**: jadx's `--deobf` renames obfuscated symbols with synthetic placeholders (`p001a`, `C0123Foo`) — useful for disambiguation but it does **not** recover original names. Pair it with the metadata recovery above.
## 8. Tracing a Complete Call Flow: Example

View File

@ -0,0 +1,108 @@
# Recovering Original Class Names from Kotlin Metadata
When R8/ProGuard obfuscates a Kotlin app, JVM symbols are renamed but the
**Kotlin metadata strings cannot be stripped** — the Kotlin runtime depends
on them at runtime for reflection, coroutines, and `data class` features.
Two annotations leak the original fully-qualified names:
## `@DebugMetadata`
Generated for nearly every Kotlin coroutine `SuspendLambda` (i.e. almost
every `suspend` function in a modern app):
```java
@DebugMetadata(
c = "com.example.feature.account.AccountRepositoryImpl$fetch$1",
f = "AccountRepositoryImpl.kt",
l = {42, 51},
m = "invokeSuspend"
)
public final class a extends SuspendLambda implements Function2<...> { ... }
```
The `c =` field carries the original outer class FQN (with a `$` suffix
for inner / lambda scopes — strip everything after the first `$` to get the
declaring class).
## `@Metadata.d2`
Every Kotlin class carries a top-level `@Metadata` annotation. The `d2`
array lists internal class refs in JVM type-descriptor format
(`Lcom/example/Foo;`):
```java
@Metadata(d1 = {"..."},
d2 = {"...","Lcom/example/feature/account/AccountRepositoryImpl;","..."})
public final class b implements ... { ... }
```
The first non-stdlib descriptor in `d2` is usually the file's primary
class.
## How to mine them
The skill ships two scripts:
```bash
# Build a mapping from a decompiled sources directory:
bash scripts/recover-kotlin-names.sh <output>/sources [mapping-dir]
# Outputs:
# <mapping-dir>/mapping.tsv obf_fqn real_fqn file
# <mapping-dir>/mapping.json same data, JSON
# <mapping-dir>/by_package/ per-real-package index files
# Query the mapping:
bash scripts/lookup-name.sh <mapping-dir> Repository # search
bash scripts/lookup-name.sh <mapping-dir> -o ab.cd # obf -> real
bash scripts/lookup-name.sh <mapping-dir> -p com.example.feature # list package
bash scripts/lookup-name.sh <mapping-dir> --grep '"api/' <output>/sources
# ^ greps decompiled code and appends '// real.fqn' to each hit
```
## What you typically recover
On a real-world obfuscated Kotlin app the script recovers **30 50 % of
classes** — but more importantly, **almost 100 % of the classes you
actually want to read**:
| Class kind | Recovery rate |
|---------------------------|---------------|
| `*Repository` / `*Impl` | ~100 % |
| `*ViewModel` | ~100 % |
| `*UseCase` / `*Interactor`| ~100 % |
| Plain `data class` DTOs | ~80 % |
| Pure-Java helper classes | low (no Kotlin metadata) |
| Anonymous inner classes | sometimes recovered as the parent FQN |
## Why `jadx --deobf` is not enough
`--deobf` renames obfuscated identifiers using internal heuristics, but the
output is still synthetic (`p001a`, `C0123Foo`). It does **not** recover
the *original* names. Kotlin metadata recovery is the only reliable way to
map back to the names the developer actually wrote, and it costs essentially
nothing — just a regex pass over the decompiled sources.
Run both: `--deobf` for fields/methods that have no metadata source, plus
the recovery script for class names.
## Limitations
- **Method names and field names** are not recovered. Kotlin metadata only
preserves class-level FQNs and a few signatures. For method names you
still need jadx-gui's interactive rename or pattern inference.
- **Pure-Java classes** carry no `@Metadata`, so they remain obfuscated.
- **Heavily inlined classes** (`@JvmInline value class`, top-level fun
files compiled into shared `*Kt.class` synthetic classes) sometimes show
up under the wrong filename — treat results as a strong hint, not gospel.
## Reading flow with the mapping
1. Run `recover-kotlin-names.sh` once after decompiling.
2. Use `lookup-name.sh --grep '<pattern>' <sources>` instead of plain `grep`
so every hit comes annotated with the real owning class.
3. When you hit an obfuscated FQN in code (e.g. `nq.e`), resolve it with
`lookup-name.sh <mapping-dir> -o nq.e` — you will often see siblings
(`nq.d`, `nq.f`, ...) that are the same class's split lambdas/inner
classes, which is useful context.

View File

@ -0,0 +1,122 @@
# Third-party host denylist used by find-api-calls.sh --urls.
#
# Patterns are extended-regex hostname suffixes / fragments. A host is
# considered "third-party noise" if any pattern below matches anywhere
# in the hostname. Lines starting with '#' and blank lines are ignored.
#
# This list is intentionally conservative: when a pattern would hide a
# legitimate first-party host (e.g. an app may run its own *.s3.amazonaws.com
# bucket), keep the pattern but expect manual review of the bucketed output.
# Google / Firebase / Play / Crashlytics
\.googleapis\.com$
\.google\.com$
\.gstatic\.com$
\.googleusercontent\.com$
\.googletagmanager\.com$
\.googlesyndication\.com$
\.firebaseio\.com$
\.firebaseapp\.com$
\.firebaseinstallations\.googleapis\.com$
\.firebaseremoteconfig\.googleapis\.com$
\.crashlytics\.com$
\.app-measurement\.com$
# Apple / Microsoft / Adobe
\.apple\.com$
\.icloud\.com$
\.microsoft\.com$
\.live\.com$
\.office\.com$
\.adobe\.com$
ns\.adobe\.com
# Meta
\.facebook\.com$
\.fbcdn\.net$
\.instagram\.com$
\.whatsapp\.com$
# Other social / messaging / video
\.twitter\.com$
\.x\.com$
\.tiktok\.com$
\.youtube\.com$
\.youtu\.be$
\.linkedin\.com$
\.snapchat\.com$
\.pinterest\.com$
\.reddit\.com$
# Mobile attribution / analytics / observability
\.appsflyersdk\.com$
\.appsflyer\.com$
\.adjust\.com$
\.branch\.io$
\.amplitude\.com$
\.segment\.com$
\.mixpanel\.com$
\.hotjar\.com$
\.clarity\.ms$
\.datadoghq\.(com|eu|us)$
\.sentry\.io$
\.bugsnag\.com$
\.newrelic\.com$
\.instabug\.com$
\.embrace\.io$
\.rollout\.io$
\.launchdarkly\.com$
# Push / notifications
\.onesignal\.com$
\.urbanairship\.com$
\.airship\.com$
# Support / chat
\.zendesk\.com$
\.intercom\.io$
\.intercomcdn\.com$
\.helpshift\.com$
\.salesforce\.com$
\.freshchat\.com$
\.kustomerapp\.com$
# Payments
\.stripe\.com$
\.braintreepayments\.com$
\.braintreegateway\.com$
\.payu\.com$
\.payu\.in$
\.paypal\.com$
\.adyen\.com$
\.checkout\.com$
\.klarna\.com$
# Maps / location
\.mapbox\.com$
\.openstreetmap\.org$
# Storage / CDN (often third-party even when the bucket name is app-specific)
\.s3\.amazonaws\.com$
\.cloudfront\.net$
\.akamaihd\.net$
\.akamaized\.net$
\.fastly\.net$
\.cloudflare\.com$
\.azureedge\.net$
# DNS / well-known infra
\.localhost$
^localhost
^127\.
# Standards / RFCs / placeholders that show up as XML/XMP namespaces
\.w3\.org$
\.w3c\.org$
example\.(com|org|net)$
# Certificate authorities
\.sectigo\.com$
\.entrust\.com$
\.digicert\.com$
\.letsencrypt\.org$

View File

@ -14,8 +14,12 @@ Arguments:
Options:
--retrofit Search only for Retrofit annotations
--okhttp Search only for OkHttp patterns
--ktor Search only for Ktor client patterns
--apollo Search only for Apollo (GraphQL) patterns
--volley Search only for Volley patterns
--urls Search only for hardcoded URLs
--paths Extract unique endpoint-shaped path string literals
(works on heavily obfuscated apps where call sites are inlined)
--auth Search only for auth-related patterns
--all Search all patterns (default)
-h, --help Show this help message
@ -29,8 +33,11 @@ EOF
SOURCE_DIR=""
SEARCH_RETROFIT=false
SEARCH_OKHTTP=false
SEARCH_KTOR=false
SEARCH_APOLLO=false
SEARCH_VOLLEY=false
SEARCH_URLS=false
SEARCH_PATHS=false
SEARCH_AUTH=false
SEARCH_ALL=true
@ -38,8 +45,11 @@ while [[ $# -gt 0 ]]; do
case "$1" in
--retrofit) SEARCH_RETROFIT=true; SEARCH_ALL=false; shift ;;
--okhttp) SEARCH_OKHTTP=true; SEARCH_ALL=false; shift ;;
--ktor) SEARCH_KTOR=true; SEARCH_ALL=false; shift ;;
--apollo) SEARCH_APOLLO=true; SEARCH_ALL=false; shift ;;
--volley) SEARCH_VOLLEY=true; SEARCH_ALL=false; shift ;;
--urls) SEARCH_URLS=true; SEARCH_ALL=false; shift ;;
--paths) SEARCH_PATHS=true; SEARCH_ALL=false; shift ;;
--auth) SEARCH_AUTH=true; SEARCH_ALL=false; shift ;;
--all) SEARCH_ALL=true; shift ;;
-h|--help) usage ;;
@ -72,6 +82,58 @@ run_grep() {
grep $GREP_OPTS -E "$pattern" "$SOURCE_DIR" 2>/dev/null || true
}
# Print a one-screen summary FIRST so a reader knows what to expect from
# the long output that follows. Skipped when a single section flag was
# requested (the user wants raw matches, not an overview). One pass over
# the tree, counts bucketed by tag — running 8 separate greps was too slow.
if [[ "$SEARCH_ALL" == true ]]; then
section "Summary (counted in a single pass)"
declare -A H=(
[retrofit]=0 [okhttp]=0 [ktor]=0 [apollo]=0 [volley]=0
[hilt]=0 [koin]=0 [bearer]=0 [hmac]=0
)
while IFS= read -r line; do
case "$line" in
*"@GET("*|*"@POST("*|*"@PUT("*|*"@DELETE("*|*"@PATCH("*|*"@HTTP("*) H[retrofit]=$((H[retrofit]+1));;
esac
case "$line" in
*"Request.Builder"*|*"HttpUrl"*|*".newCall("*) H[okhttp]=$((H[okhttp]+1));;
esac
case "$line" in
*"BearerTokens"*|*"defaultRequest {"*|*"client.get("*|*"client.post("*|*"httpClient.get("*|*"httpClient.post("*|*"HttpClient.get("*) H[ktor]=$((H[ktor]+1));;
esac
case "$line" in
*"ApolloClient"*|*".serverUrl("*) H[apollo]=$((H[apollo]+1));;
esac
case "$line" in
*"StringRequest"*|*"JsonObjectRequest"*|*"RequestQueue"*) H[volley]=$((H[volley]+1));;
esac
case "$line" in
*"@HiltAndroidApp"*|*"@AndroidEntryPoint"*|*"@HiltViewModel"*|*"@Provides"*|*"@Binds"*) H[hilt]=$((H[hilt]+1));;
esac
case "$line" in
*"org.koin."*|*"module {"*|*"single<"*|*"factory<"*|*"singleOf("*|*"factoryOf("*) H[koin]=$((H[koin]+1));;
esac
case "$line" in
*'"Bearer '*|*'"bearer '*|*"BearerTokens"*) H[bearer]=$((H[bearer]+1));;
esac
case "$line" in
*"HmacSHA"*|*'Mac.getInstance("Hmac'*) H[hmac]=$((H[hmac]+1));;
esac
done < <(grep -rEh --include='*.java' --include='*.kt' \
'@(GET|POST|PUT|DELETE|PATCH|HTTP)\(|Request\.Builder|HttpUrl|\.newCall\(|BearerTokens|defaultRequest \{|client\.(get|post)\(|httpClient\.(get|post)\(|ApolloClient|\.serverUrl\(|StringRequest|JsonObjectRequest|RequestQueue|@HiltAndroidApp|@AndroidEntryPoint|@HiltViewModel|@Provides|@Binds|org\.koin\.|module \{|single<|factory<|"[Bb]earer |HmacSHA|Mac\.getInstance' \
"$SOURCE_DIR" 2>/dev/null || true)
printf ' HTTP framework: Retrofit=%-5s OkHttp=%-5s Ktor=%-5s Apollo=%-5s Volley=%-5s\n' \
"${H[retrofit]}" "${H[okhttp]}" "${H[ktor]}" "${H[apollo]}" "${H[volley]}"
printf ' DI framework: Hilt/Dagger=%-5s Koin=%-5s\n' \
"${H[hilt]}" "${H[koin]}"
printf ' Auth signals: Bearer=%-5s HMAC/Sign=%-5s\n' \
"${H[bearer]}" "${H[hmac]}"
echo
echo " Run with one of --retrofit / --okhttp / --ktor / --apollo / --volley /"
echo " --paths / --urls / --auth to inspect a single section."
fi
# --- Retrofit ---
if [[ "$SEARCH_ALL" == true || "$SEARCH_RETROFIT" == true ]]; then
section "Retrofit Annotations"
@ -90,16 +152,123 @@ if [[ "$SEARCH_ALL" == true || "$SEARCH_OKHTTP" == true ]]; then
run_grep '(\.url\s*\(|\.addQueryParameter|\.addPathSegment|\.scheme\s*\(|\.host\s*\()'
fi
# --- Ktor (Kotlin) ---
# Ktor doesn't use annotations. Endpoints appear as string args to
# client.get/post/etc., or are built via HttpRequestBuilder.url(...). Auth
# is configured via the bearer { loadTokens / refreshTokens } DSL.
if [[ "$SEARCH_ALL" == true || "$SEARCH_KTOR" == true ]]; then
section "Ktor — Client Calls"
run_grep '\b(client|httpClient|HttpClient)\.(get|post|put|delete|patch|head|request)\s*[<(]'
section "Ktor — Request Building / Default Request"
run_grep '(HttpRequestBuilder|defaultRequest\s*\{|\burl\s*\(\s*"|URLBuilder|URLProtocol)'
section "Ktor — Auth Plugin (Bearer / Refresh)"
run_grep '(\bbearer\s*\{|BearerTokens\s*\(|loadTokens\s*\{|refreshTokens\s*\{|\bAuth\s*\)\s*\{)'
fi
# --- Apollo (GraphQL) ---
if [[ "$SEARCH_ALL" == true || "$SEARCH_APOLLO" == true ]]; then
section "Apollo — GraphQL Client"
run_grep '(ApolloClient|\.serverUrl\s*\(|\.subscriptionNetworkTransport|HttpNetworkTransport)'
section "Apollo — Operations"
run_grep '(\.query\s*\(\s*[A-Z]|\.mutation\s*\(\s*[A-Z]|\.subscription\s*\(\s*[A-Z])'
fi
# --- Volley ---
if [[ "$SEARCH_ALL" == true || "$SEARCH_VOLLEY" == true ]]; then
section "Volley Requests"
run_grep '(StringRequest|JsonObjectRequest|JsonArrayRequest|ImageRequest|RequestQueue|Volley\.newRequestQueue)'
fi
# --- Endpoint-shaped path literals ---
# Survives R8 obfuscation: even when call sites are inlined to a.b(c, "path"),
# the path strings themselves are not obfuscated. This produces a deduplicated
# inventory of likely API endpoints that other modes miss.
if [[ "$SEARCH_ALL" == true || "$SEARCH_PATHS" == true ]]; then
section "Endpoint-Shaped Path Literals (deduplicated)"
# Quoted strings that begin with /<segment> or <segment>/ where the leading
# segment is a typical API root word. Cap segment count and length to keep
# the regex grounded.
# An endpoint-shaped string is one of:
# "/seg/seg..." — absolute path with >= 2 segments
# "api-root/seg/seg..." — relative path starting with a known
# API root keyword and containing >= 1
# '/' followed by another segment
# Segments are URL-safe chars plus {} for path-template placeholders.
SEG='[A-Za-z0-9_{}.\-]+'
ROOT='(api|v[0-9]+|graphql|rest|mobile|auth|oauth|sso|users?|account|session|token|register|signup|signin|logout|password|verify|otp|sms|profile|customer|cart|basket|order|checkout|payment|invoice|product|catalog|inventory|search|category|favo[u]?rites?|wishlist|address|location|delivery|shipping|review|feedback|notification|push|message|chat|track|event|stat[a-z]*|metric|config|settings?|feature|flag|banner|content|media|upload|download|file|image|video|live|stream|webhook|callback)'
PATHS_REGEX="\"(/${SEG}(/${SEG})+/?|${ROOT}(/${SEG})+/?)\""
# Filter out frequent false positives (MIME types, /proc, /sys, /dev).
EXCLUDE='^"(image|video|audio|text|application|content|font|model|multipart|message)/|^"/(proc|sys|dev|tmp|etc|usr|var|opt)/'
# Print a flat unique list rather than file:line — this is the inventory.
grep -rhoE --include='*.java' --include='*.kt' "$PATHS_REGEX" "$SOURCE_DIR" 2>/dev/null \
| grep -Ev "$EXCLUDE" \
| sort -u
echo
section "Endpoint-Shaped Path Literals — call sites"
grep $GREP_OPTS -E "$PATHS_REGEX" "$SOURCE_DIR" 2>/dev/null \
| grep -Ev ":[0-9]+:.*${EXCLUDE#^}" || true
fi
# --- Hardcoded URLs ---
# A loose grep for http(s)://... drowns in compression-dictionary garbage and
# in third-party SDK URLs (Google, Firebase, AppsFlyer, Datadog, ...). The
# strict regex requires a syntactically valid hostname and rejects strings
# containing whitespace, angle brackets, or non-printable bytes. Hosts are
# then bucketed into "first-party candidates" vs "third-party (denylist)".
if [[ "$SEARCH_ALL" == true || "$SEARCH_URLS" == true ]]; then
section "Hardcoded URLs (http:// and https://)"
run_grep '"https?://[^"]+'
HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
DENYLIST="$HERE/../references/third_party_hosts.txt"
# Hostname must have at least one dot and end in a 2+ letter TLD.
STRICT_URL='https?://[A-Za-z0-9-]+(\.[A-Za-z0-9-]+)+\.[A-Za-z]{2,}(:[0-9]{1,5})?(/[^"<>[:space:]]*)?'
TMP="$(mktemp)"
trap 'rm -f "$TMP"' EXIT
grep -rhoE --include='*.java' --include='*.kt' "$STRICT_URL" "$SOURCE_DIR" 2>/dev/null \
| sort -u > "$TMP"
# Extract host: strip scheme, take part up to first ':' or '/'.
HOSTS_TMP="$(mktemp)"
sed -E 's#^https?://##; s#[/:].*$##' "$TMP" | sort -u > "$HOSTS_TMP"
if [[ -f "$DENYLIST" ]]; then
# Build a single combined regex from the denylist (one line each).
DENY_REGEX="$(grep -vE '^\s*(#|$)' "$DENYLIST" | tr '\n' '|' | sed 's/|$//')"
THIRD_HOSTS=$(grep -E "$DENY_REGEX" "$HOSTS_TMP" || true)
FIRST_HOSTS=$(grep -vE "$DENY_REGEX" "$HOSTS_TMP" || true)
else
THIRD_HOSTS=""
FIRST_HOSTS=$(cat "$HOSTS_TMP")
fi
section "Likely First-Party Hosts (frequency-sorted)"
if [[ -n "$FIRST_HOSTS" ]]; then
while IFS= read -r h; do
[[ -z "$h" ]] && continue
n=$(grep -cE "://${h//./\\.}([/:\"]|$)" "$TMP" || true)
printf ' %5d %s\n' "$n" "$h"
done <<< "$FIRST_HOSTS" | sort -rn -k1
else
echo " (none — every URL matched the third-party denylist)"
fi
section "Third-Party Hosts (denylist matches, collapsed)"
if [[ -n "$THIRD_HOSTS" ]]; then
echo "$THIRD_HOSTS" | sed 's/^/ /'
else
echo " (none)"
fi
section "All First-Party URLs (full strings)"
if [[ -n "$FIRST_HOSTS" ]]; then
while IFS= read -r h; do
[[ -z "$h" ]] && continue
grep -E "://${h//./\\.}([/:\"]|$)" "$TMP" | sed 's/^/ /'
done <<< "$FIRST_HOSTS"
fi
rm -f "$HOSTS_TMP" "$TMP"
trap - EXIT
section "HttpURLConnection"
run_grep '(openConnection|setRequestMethod|HttpURLConnection|HttpsURLConnection)'
section "WebView URLs"
@ -109,9 +278,27 @@ fi
# --- Auth patterns ---
if [[ "$SEARCH_ALL" == true || "$SEARCH_AUTH" == true ]]; then
section "Authentication & API Keys"
run_grep -i '(api[_-]?key|auth[_-]?token|bearer|authorization|x-api-key|client[_-]?secret|access[_-]?token)'
run_grep -i '(api[_-]?key|auth[_-]?token|bearer|authorization|x-api-key|client[_-]?secret|access[_-]?token|refresh[_-]?token)'
# Request-signing schemes: a hardcoded HMAC / RSA secret in an APK is a
# security finding worth surfacing prominently. These patterns catch the
# common shapes of homegrown / SDK-issued request signers.
section "Request Signing (HMAC / signature schemes)"
run_grep '(HmacSHA(1|256|512)|Mac\.getInstance\("Hmac|SecretKeySpec\(|Signature\.getInstance\()'
run_grep -i '(x-signature|x-client-authorization|x-amz-signature|x-hmac|aws4-hmac|signRequest|signatureFor|computeSignature|signaturev[0-9])'
# Hardcoded high-entropy strings adjacent to "secret"/"key" assignments
# are the canonical leaked-credential pattern.
section "Possible Hardcoded Secrets / Keys"
run_grep -i '(app[_-]?secret|client[_-]?secret|signing[_-]?key|hmac[_-]?secret|consumer[_-]?secret|private[_-]?key)'
section "Base URLs and Constants"
run_grep -i '(BASE_URL|API_URL|SERVER_URL|ENDPOINT|API_BASE|HOST_NAME)'
# Ktor BearerTokens / refresh DSL — common on Kotlin apps and lives on
# Ktor's public API, so it survives R8 unchanged.
section "Ktor Auth (Bearer + Refresh)"
run_grep '(BearerTokens|loadTokens\s*\{|refreshTokens\s*\{|\bbearer\s*\{)'
fi
echo

View File

@ -0,0 +1,241 @@
#!/usr/bin/env bash
# fingerprint.sh — Triage an APK/XAPK before decompiling.
#
# Detects mobile framework (Flutter, React Native, Cordova/Capacitor,
# Xamarin, KMP/native), HTTP-stack hints, obfuscation level, native libs,
# and notable third-party SDKs.
#
# Decompiling Java is mostly useless for Flutter / RN / Xamarin / Cordova
# apps — different tools are needed. Run this BEFORE Phase 2 to choose
# the right path.
set -euo pipefail
usage() {
cat <<EOF
Usage: fingerprint.sh <file.apk|file.xapk>
Prints a one-screen summary:
* mobile framework (with rationale)
* HTTP / DI / serialization stack hints
* obfuscation indicator
* native libraries (consolidated across split APKs)
* notable third-party SDKs found in assets/
EOF
exit 0
}
[[ $# -lt 1 || "$1" == "-h" || "$1" == "--help" ]] && usage
INPUT="$1"
[[ ! -f "$INPUT" ]] && { echo "File not found: $INPUT" >&2; exit 1; }
TMP="$(mktemp -d -t apkfp.XXXXXX)"
trap 'rm -rf "$TMP"' EXIT
# Resolve to a list of APKs (handle XAPK = ZIP of APKs)
APKS=()
case "${INPUT,,}" in
*.xapk|*.apks|*.apkm)
unzip -q -o "$INPUT" -d "$TMP/xapk"
while IFS= read -r p; do APKS+=("$p"); done < <(find "$TMP/xapk" -maxdepth 2 -type f -name '*.apk')
;;
*.apk)
APKS=("$INPUT")
;;
*)
echo "Unsupported input: $INPUT" >&2; exit 1 ;;
esac
# Aggregate ZIP listings from every APK in the bundle (split-aware view)
LISTING="$TMP/listing.txt"
: > "$LISTING"
for apk in "${APKS[@]}"; do
unzip -l -- "$apk" 2>/dev/null | awk '{print $NF}' >> "$LISTING"
done
# Most class-level libs live inside classes*.dex, not as visible zip paths.
# Extract the type-name strings out of each dex with `strings` and append them
# to the listing so `has()` can match e.g. 'io/ktor/' or 'org/koin/'.
DEX_STRINGS="$TMP/dex_strings.txt"
: > "$DEX_STRINGS"
for apk in "${APKS[@]}"; do
for dex in $(unzip -Z1 -- "$apk" 2>/dev/null | grep -E '^classes[0-9]*\.dex$' || true); do
# DEX type descriptors look like "Lcom/foo/Bar;". Extract the inner
# slash-separated FQN so callers can match e.g. 'io/ktor/' directly.
unzip -p -- "$apk" "$dex" 2>/dev/null \
| strings -n 8 \
| grep -oE 'L[a-z][a-zA-Z0-9_]*(/[a-zA-Z0-9_$]+)+;' \
| sed -E 's/^L//; s/;$//' \
>> "$DEX_STRINGS" || true
done
done
sort -u "$DEX_STRINGS" -o "$DEX_STRINGS"
has() { grep -qE "$1" "$LISTING" || grep -qE "$1" "$DEX_STRINGS"; }
# ----------------------------------------------------------------------
# Framework detection (priority order — first match wins)
# ----------------------------------------------------------------------
FRAMEWORK="unknown"
RATIONALE=""
if has '^lib/[^/]+/libflutter\.so$'; then
FRAMEWORK="Flutter"
RATIONALE="lib/<abi>/libflutter.so present"
has '^lib/[^/]+/libapp\.so$' && RATIONALE+="; libapp.so contains AOT-compiled Dart"
elif has '^lib/[^/]+/libhermes\.so$' || has '^assets/index\.android\.bundle$' || has '^lib/[^/]+/libreactnativejni\.so$'; then
FRAMEWORK="React Native"
reasons=()
has '^lib/[^/]+/libhermes\.so$' && reasons+=("libhermes.so")
has '^lib/[^/]+/libreactnativejni\.so$' && reasons+=("libreactnativejni.so")
has '^assets/index\.android\.bundle$' && reasons+=("assets/index.android.bundle")
RATIONALE="${reasons[*]}"
elif has '^assets/www/index\.html$' || has '^assets/www/cordova\.js$' || has '^assets/public/index\.html$'; then
FRAMEWORK="Cordova / Capacitor (WebView hybrid)"
RATIONALE="assets/www/ or assets/public/ shell present"
elif has '^lib/[^/]+/libmonodroid\.so$' || has '^assemblies/'; then
FRAMEWORK="Xamarin / .NET MAUI"
RATIONALE="libmonodroid.so or assemblies/ present — code is in .NET DLLs"
elif has '^lib/[^/]+/libmaui\.so$'; then
FRAMEWORK=".NET MAUI"
RATIONALE="libmaui.so present"
elif has '^assets/flutter_assets/' && ! has '^lib/[^/]+/libflutter\.so$'; then
FRAMEWORK="Flutter (code-only split?)"
RATIONALE="flutter_assets/ but no libflutter.so in this APK — check splits"
else
# Native: distinguish Compose vs classic Android by androidx.compose presence
if has 'androidx\.compose'; then
FRAMEWORK="Native Android (Kotlin + Jetpack Compose)"
RATIONALE="androidx.compose.* libraries detected"
elif has '^META-INF/.*\.kotlin_module$'; then
FRAMEWORK="Native Android (Kotlin)"
RATIONALE="kotlin_module metadata present, no Compose markers"
else
FRAMEWORK="Native Android (Java/Kotlin)"
RATIONALE="no cross-platform framework markers found"
fi
fi
# ----------------------------------------------------------------------
# HTTP / DI / serialization stack hints
# ----------------------------------------------------------------------
http=()
has 'retrofit2' && http+=("Retrofit")
has 'okhttp3' && http+=("OkHttp")
has 'io/ktor/' && http+=("Ktor")
has 'com/apollographql/' && http+=("Apollo (GraphQL)")
has 'com/android/volley' && http+=("Volley")
di=()
has 'dagger/hilt/' && di+=("Hilt")
has '^META-INF/.*dagger.*' && di+=("Dagger")
has 'org/koin/' && di+=("Koin")
has 'javax/inject/' && [[ ${#di[@]} -eq 0 ]] && di+=("javax.inject")
ser=()
has 'kotlinx/serialization/' && ser+=("kotlinx.serialization")
has 'com/google/gson/' && ser+=("Gson")
has 'com/squareup/moshi/' && ser+=("Moshi")
has 'com/fasterxml/jackson/' && ser+=("Jackson")
# ----------------------------------------------------------------------
# Obfuscation indicator (R8/ProGuard) — count single-letter dex packages
# ----------------------------------------------------------------------
# Note: pipefail is on, so guard greps that may legitimately return 0 matches.
short_dirs=$( { grep -oE '^[a-z]{1,2}/' "$LISTING" || true; } | sort -u | wc -l | tr -d ' ')
if [[ "$short_dirs" -gt 30 ]]; then
OBFUSCATION="HIGH ($short_dirs single/double-letter dirs at root)"
elif [[ "$short_dirs" -gt 10 ]]; then
OBFUSCATION="MODERATE ($short_dirs short root dirs)"
else
OBFUSCATION="LOW (no significant short-name namespace pollution)"
fi
# ----------------------------------------------------------------------
# Native libraries (consolidated)
# ----------------------------------------------------------------------
NATIVE=$(grep -E '^lib/[^/]+/[^/]+\.so$' "$LISTING" | sort -u || true)
# ----------------------------------------------------------------------
# Notable third-party SDKs (assets-based markers)
# ----------------------------------------------------------------------
sdks=()
has '^assets/com/appsflyer/' && sdks+=("AppsFlyer")
has 'datadog\.buildId|com/datadog/' && sdks+=("Datadog")
has 'io/sentry/' && sdks+=("Sentry")
has 'com/google/firebase/' && sdks+=("Firebase")
has 'com/google/android/gms/' && sdks+=("Google Play Services")
has 'com/facebook/' && sdks+=("Facebook SDK")
has 'com/payu/' && sdks+=("PayU")
has 'com/stripe/' && sdks+=("Stripe")
has 'com/braintreepayments/' && sdks+=("Braintree")
has 'com/storyteller/' && sdks+=("Storyteller")
has 'zendesk/' && sdks+=("Zendesk")
has 'com/intercom/' && sdks+=("Intercom")
has 'com/segment/analytics' && sdks+=("Segment")
has 'com/amplitude/' && sdks+=("Amplitude")
has 'com/mixpanel/' && sdks+=("Mixpanel")
has 'com/onesignal/' && sdks+=("OneSignal")
has 'com/microsoft/clarity' && sdks+=("Microsoft Clarity")
has 'com/hotjar/' && sdks+=("Hotjar")
has 'com/instabug/' && sdks+=("Instabug")
# BuildConfig.java is almost never obfuscated and often holds base URLs / flavor.
if has 'BuildConfig\.class$'; then
BUILDCONFIG="present (grep BuildConfig.java after decompile for base URLs / flavor)"
else
BUILDCONFIG="not detected in zip listing (still worth grepping after decompile)"
fi
# ----------------------------------------------------------------------
# Summary
# ----------------------------------------------------------------------
echo "=== APK Fingerprint: $(basename "$INPUT") ==="
echo
echo "Framework: $FRAMEWORK"
echo " Rationale: $RATIONALE"
echo "Obfuscation: $OBFUSCATION"
echo
echo "HTTP stack: ${http[*]:-none detected}"
echo "DI: ${di[*]:-none detected}"
echo "Serialization: ${ser[*]:-none detected}"
echo "BuildConfig: $BUILDCONFIG"
echo
echo "Third-party SDKs: ${sdks[*]:-none detected}"
echo
echo "Native libraries (consolidated across splits):"
if [[ -n "$NATIVE" ]]; then
echo "$NATIVE" | sed 's/^/ /'
else
echo " (none)"
fi
echo
# ----------------------------------------------------------------------
# Recommendation
# ----------------------------------------------------------------------
echo "Recommended next step:"
case "$FRAMEWORK" in
Flutter*)
echo " Java decompilation will yield ~no app code. The Dart logic lives in"
echo " libapp.so (AOT). Use tools designed for Flutter:"
echo " - reFlutter / Doldrums / blutter (extract Dart class structure)"
echo " - strings/rabin2 on libapp.so for endpoints & string constants"
;;
React*)
echo " Java code is just the RN host. Real app logic is in JS/Hermes:"
echo " - if Hermes: hbctool disasm assets/index.android.bundle"
echo " - if JSC: js-beautify the bundle and grep for 'fetch('/'axios'"
;;
Cordova*)
echo " All app code is in assets/www/ (or assets/public/). Just unzip and"
echo " inspect the HTML/JS — no Java decompile needed."
;;
Xamarin*|.NET*)
echo " App logic is in .NET DLLs (assemblies/). Use ILSpy or dotPeek;"
echo " jadx will only show the Mono host."
;;
*)
echo " Proceed with Phase 2: bash scripts/decompile.sh <file>"
;;
esac

View File

@ -0,0 +1,85 @@
#!/usr/bin/env bash
# lookup-name.sh — Query the mapping produced by recover-kotlin-names.sh.
#
# Modes:
# lookup-name.sh <mapping-dir> <substring> search by real-FQN substring
# lookup-name.sh <mapping-dir> -o <obf> resolve obf -> real
# lookup-name.sh <mapping-dir> -p <pkg> list a real package
# lookup-name.sh <mapping-dir> --grep <regex> <sources-dir>
# grep decompiled sources and annotate each hit with the real class name
set -euo pipefail
usage() {
cat <<EOF
Usage: lookup-name.sh <mapping-dir> <query>
lookup-name.sh <mapping-dir> -o <obf-fqn>
lookup-name.sh <mapping-dir> -p <real-package-substring>
lookup-name.sh <mapping-dir> --grep <regex> <sources-dir>
<mapping-dir> is the directory produced by recover-kotlin-names.sh
(must contain mapping.json).
EOF
exit 0
}
[[ $# -lt 2 ]] && usage
DIR="$1"; shift
[[ ! -f "$DIR/mapping.json" ]] && { echo "no mapping.json in $DIR" >&2; exit 1; }
python3 - "$DIR" "$@" <<'PY'
import json, os, re, sys, subprocess
DIR = sys.argv[1]
args = sys.argv[2:]
MAP = json.load(open(os.path.join(DIR, "mapping.json")))
REV = {}
for o, r in MAP.items():
REV.setdefault(r, []).append(o)
def search(q):
ql = q.lower()
for r in sorted(REV):
if ql in r.lower():
print(r)
for o in sorted(REV[r]):
print(f" {o}")
def by_obf(o):
if o not in MAP:
print(f"no mapping for {o}", file=sys.stderr); sys.exit(1)
print(f"{o} -> {MAP[o]}")
sibs = [s for s in REV[MAP[o]] if s != o]
for s in sorted(sibs):
print(f" sibling: {s}")
def by_pkg(p):
pl = p.lower()
for r in sorted(REV):
if pl in r.rsplit(".", 1)[0].lower():
print(r)
for o in sorted(REV[r]):
print(f" {o}")
def grep_annot(pattern, sources):
res = subprocess.run(
["grep", "-rEn", "--include=*.java", pattern, sources],
capture_output=True, text=True)
for line in res.stdout.splitlines():
try:
path, lineno, content = line.split(":", 2)
except ValueError:
continue
rel = os.path.relpath(path, sources)
obf = rel.replace(os.sep, ".")[:-5]
suffix = f" // {MAP[obf]}" if obf in MAP else ""
print(f"{rel}:{lineno}:{content}{suffix}")
if args[0] == "-o" and len(args) == 2:
by_obf(args[1])
elif args[0] == "-p" and len(args) == 2:
by_pkg(args[1])
elif args[0] == "--grep" and len(args) == 3:
grep_annot(args[1], args[2])
else:
search(" ".join(args))
PY

View File

@ -0,0 +1,140 @@
#!/usr/bin/env bash
# recover-kotlin-names.sh — Rebuild a (obfuscated -> real) class-name map
# from Kotlin metadata strings left in decompiled sources.
#
# R8 obfuscates JVM symbols but cannot strip the Kotlin metadata strings —
# the Kotlin runtime (reflection, coroutines) needs them at runtime. Two
# annotations carry the original FQN:
#
# * @DebugMetadata(c = "<full.qualified.Name>", f = "<File.kt>", ...)
# emitted for almost every `suspend` function (every coroutine
# SuspendLambda).
#
# * @Metadata(... d2 = {"...L<pkg/Class>;..."} ...) listing internal
# class refs of the file.
#
# Typical recovery on a real-world app: 30-50 % of classes regain their real
# names — usually 100 % of the *Repository / *ViewModel / *UseCase / *Impl
# classes you actually want to read.
set -euo pipefail
usage() {
cat <<EOF
Usage: recover-kotlin-names.sh <decompiled-sources-dir> [output-dir]
Walks every *.java under <decompiled-sources-dir>, mines @DebugMetadata
and @Metadata annotations, and writes:
<output-dir>/mapping.tsv tab-separated obf_fqn <TAB> real_fqn <TAB> file
<output-dir>/mapping.json same data as JSON { obf_fqn: real_fqn, ... }
<output-dir>/by_package/ one file per real package, listing
real_fqn <TAB> obf_fqn <TAB> file
If [output-dir] is omitted, files are written next to the sources dir.
EOF
exit 0
}
[[ $# -lt 1 || "$1" == "-h" || "$1" == "--help" ]] && usage
SRC="$1"
OUT="${2:-$(dirname "$SRC")/mapping}"
[[ ! -d "$SRC" ]] && { echo "not a directory: $SRC" >&2; exit 1; }
mkdir -p "$OUT/by_package"
python3 - "$SRC" "$OUT" <<'PY'
import os, re, sys, json
from collections import defaultdict
SRC, OUT = sys.argv[1], sys.argv[2]
# @DebugMetadata(c = "com.foo.Bar$Inner$1", ...)
RE_DEBUG = re.compile(r'@DebugMetadata\([^)]*?c\s*=\s*"([^"]+)"', re.S)
# @Metadata(... d2 = { "...Lcom/foo/Bar;..." ...} )
RE_DTWO = re.compile(r'@Metadata\([^)]*?d2\s*=\s*\{([^}]*)\}', re.S)
RE_LCLASS = re.compile(r'L([A-Za-z][\w/$]+);')
# jadx sometimes emits this comment for renamed classes
RE_RENAMED = re.compile(r'/\*\s*renamed from:\s*([\w.$]+)\s*\*/')
# Skip third-party / framework trees — their names are already real.
SKIP_PREFIXES = (
"kotlin.", "kotlinx.", "androidx.", "android.", "java.", "javax.",
"com.google.", "com.facebook.", "com.appsflyer.", "com.datadog.",
"io.ktor.", "io.sentry.", "io.realm.", "okhttp3.", "okio.",
"com.squareup.", "com.bumptech.", "com.airbnb.", "com.payu.",
"com.storyteller.", "zendesk.", "io.intercom.", "com.microsoft.",
"com.tinder.", "com.hotjar.", "com.amplitude.", "com.segment.",
"com.mixpanel.", "com.onesignal.", "com.stripe.", "com.braintreepayments.",
"retrofit2.", "dagger.", "javax.inject.", "org.jetbrains.",
)
mapping = {}
file_real = {}
counts = defaultdict(int)
for dp, _, files in os.walk(SRC):
for f in files:
if not f.endswith(".java"):
continue
path = os.path.join(dp, f)
rel = os.path.relpath(path, SRC)
obf = rel[:-5].replace(os.sep, ".")
if obf.startswith(SKIP_PREFIXES):
continue
try:
text = open(path, "r", errors="replace").read()
except OSError:
continue
real = None
m = RE_DEBUG.search(text)
if m:
real = m.group(1).split("$", 1)[0]
counts["debug_meta"] += 1
if not real:
m = RE_DTWO.search(text)
if m:
for lm in RE_LCLASS.finditer(m.group(1)):
cand = lm.group(1).replace("/", ".").split("$", 1)[0]
if "." in cand and not cand.startswith(("kotlin.", "java.", "android")):
real = cand
counts["d2"] += 1
break
if not real:
m = RE_RENAMED.search(text)
if m:
real = m.group(1)
counts["renamed"] += 1
if real:
mapping[obf] = real
file_real[obf] = path
with open(os.path.join(OUT, "mapping.tsv"), "w") as f:
f.write("obf_fqn\treal_fqn\tfile\n")
for k in sorted(mapping):
f.write(f"{k}\t{mapping[k]}\t{file_real[k]}\n")
with open(os.path.join(OUT, "mapping.json"), "w") as f:
json.dump(mapping, f, indent=2, sort_keys=True)
by_pkg = defaultdict(list)
for obf, real in mapping.items():
pkg = real.rsplit(".", 1)[0] if "." in real else "(default)"
by_pkg[pkg].append((real, obf, file_real[obf]))
for pkg, rows in by_pkg.items():
safe = pkg.replace(".", "_") or "default"
with open(os.path.join(OUT, "by_package", f"{safe}.txt"), "w") as f:
for real, obf, p in sorted(rows):
f.write(f"{real}\t{obf}\t{p}\n")
print(f"Recovered {len(mapping)} class names")
for k, v in counts.items():
print(f" via {k}: {v}")
print(f"Real packages: {len(by_pkg)}")
print(f"Wrote {OUT}/mapping.tsv, mapping.json, by_package/")
PY