Merge a2a0a97f23 into 6a31ed3fa2
This commit is contained in:
commit
615c33aab8
|
|
@ -24,6 +24,31 @@ If anything is missing, follow the installation instructions in `${CLAUDE_PLUGIN
|
|||
|
||||
## Workflow
|
||||
|
||||
### Phase 0: Fingerprint the App (recommended before anything else)
|
||||
|
||||
Before installing tools or decompiling, run a fast triage to determine what
|
||||
kind of app you are looking at. **Decompiling Java is mostly useless for
|
||||
Flutter, React Native, Cordova/Capacitor, and Xamarin apps** — the real code
|
||||
lives elsewhere. The fingerprint script tells you which.
|
||||
|
||||
```bash
|
||||
bash ${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/scripts/fingerprint.sh <file.apk|file.xapk>
|
||||
```
|
||||
|
||||
It prints, in one screen:
|
||||
|
||||
- **Mobile framework** (Flutter / React Native / Cordova / Xamarin / Native Kotlin / etc.) with the file marker that triggered the verdict.
|
||||
- **HTTP stack** (Retrofit, OkHttp, Ktor, Apollo, Volley) detected via DEX string scan — works even when class names are obfuscated.
|
||||
- **DI / serialization** signals (Hilt, Dagger, Koin, kotlinx.serialization, Moshi, Gson, Jackson).
|
||||
- **Obfuscation level** estimate based on root-level short-named packages.
|
||||
- **Notable third-party SDKs** (AppsFlyer, Datadog, Sentry, Firebase, payment SDKs, support/chat SDKs, etc.).
|
||||
- **Consolidated native libraries** across the base APK and all splits — XAPK split bundles often place `.so` files in `config.<abi>.apk`, not in `base.apk`.
|
||||
- **Recommended next step**, which differs by framework (e.g. for Flutter the script suggests `blutter` / `strings libapp.so` rather than jadx).
|
||||
|
||||
If the fingerprint says the app is Flutter / RN / Cordova / Xamarin, **stop**
|
||||
and switch to the framework-appropriate tooling. Phases 1–5 below assume a
|
||||
native (Java/Kotlin) Android app.
|
||||
|
||||
### Phase 1: Verify and Install Dependencies
|
||||
|
||||
Before decompiling, confirm that the required tools are available — and install any that are missing.
|
||||
|
|
@ -123,12 +148,45 @@ Navigate the decompiled output to understand the app's architecture.
|
|||
- Distinguish app code from third-party libraries
|
||||
- Look for packages named `api`, `network`, `data`, `repository`, `service`, `retrofit`, `http` — these are where API calls live
|
||||
|
||||
3. **Identify the architecture pattern**:
|
||||
3. **Read every `BuildConfig.java`** — these are almost never obfuscated and frequently leak the highest-signal constants in the entire APK (base URLs, flavor names, build type, third-party API keys, feature flags):
|
||||
```bash
|
||||
find <output>/sources -name BuildConfig.java -exec grep -H '=' {} \;
|
||||
```
|
||||
Each Gradle module emits its own `BuildConfig`, so expect 1–N hits. Read all of them.
|
||||
|
||||
4. **Identify the architecture pattern**:
|
||||
- MVP: look for `Presenter` classes
|
||||
- MVVM: look for `ViewModel` classes and `LiveData`/`StateFlow`
|
||||
- Clean Architecture: look for `domain`, `data`, `presentation` packages
|
||||
- This informs where to look for network calls in the next phases
|
||||
|
||||
### Phase 3.5: Recover Kotlin Class Names (only for obfuscated Kotlin apps)
|
||||
|
||||
If Phase 0 reported moderate / high obfuscation **and** the app is Kotlin
|
||||
(Compose / kotlin_module markers detected), run the metadata recovery
|
||||
script before tracing call flows. R8 obfuscates JVM symbols but cannot
|
||||
strip Kotlin metadata strings, so original FQNs leak through
|
||||
`@DebugMetadata` and `@Metadata.d2`.
|
||||
|
||||
```bash
|
||||
bash ${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/scripts/recover-kotlin-names.sh \
|
||||
<output>/sources <output>/mapping
|
||||
```
|
||||
|
||||
Then use the lookup helper instead of plain grep — every hit comes
|
||||
annotated with the owning class's real name:
|
||||
|
||||
```bash
|
||||
bash ${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/scripts/lookup-name.sh \
|
||||
<output>/mapping --grep '"/api/' <output>/sources
|
||||
```
|
||||
|
||||
Typical recovery on a real-world Kotlin app: ~100% of `*Repository` /
|
||||
`*ViewModel` / `*UseCase` / `*Impl` classes, ~80% of DTOs.
|
||||
|
||||
See `${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/references/kotlin-name-recovery.md`
|
||||
for the full technique and limitations.
|
||||
|
||||
### Phase 4: Trace Call Flows
|
||||
|
||||
Follow execution paths from user-facing entry points down to network calls.
|
||||
|
|
@ -190,15 +248,32 @@ On Windows (PowerShell):
|
|||
& "${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/scripts/find-api-calls.ps1" <output>/sources/ -Auth
|
||||
```
|
||||
|
||||
Then, for each discovered endpoint, read the surrounding source code to extract:
|
||||
- HTTP method and path
|
||||
- Base URL
|
||||
- Path parameters, query parameters, request body
|
||||
- Headers (especially authentication)
|
||||
- Response type
|
||||
- Where it's called from (the call chain from Phase 4)
|
||||
Document the endpoints in **two tiers** — going deep on every endpoint is
|
||||
prohibitively expensive on apps with 100+ paths, and most of them do not
|
||||
warrant it. Always produce Tier 1; expand Tier 2 only for the endpoints
|
||||
that matter.
|
||||
|
||||
**Document each endpoint** using this format:
|
||||
#### Tier 1 — flat inventory (always)
|
||||
|
||||
A single table covering every discovered endpoint. Aim for one line each;
|
||||
if you cannot determine a column, write `?`.
|
||||
|
||||
| Host | Method | Path | Auth | Source file |
|
||||
|------|--------|------|------|-------------|
|
||||
| `api.example.com` | GET | `/v1/users/profile` | Bearer | `com/example/api/UserApi.java` |
|
||||
| `api.example.com` | POST | `/v1/auth/login` | none | `com/example/api/AuthApi.java` |
|
||||
|
||||
This table answers "what does the backend look like" in one screen and
|
||||
takes ~5 minutes to produce from the `--paths` output even on a large app.
|
||||
|
||||
#### Tier 2 — per-endpoint detail (only for high-value endpoints)
|
||||
|
||||
Reserve the detailed format for the few endpoints that actually need it:
|
||||
|
||||
- the entire authentication flow (login, refresh, logout, OTP/SMS, anonymous, registration)
|
||||
- payment / checkout / order-creation endpoints
|
||||
- anything the user explicitly asked about
|
||||
- anything that looked unusual during the scan (custom signing, undocumented headers, etc.)
|
||||
|
||||
```markdown
|
||||
### `METHOD /path`
|
||||
|
|
@ -213,6 +288,10 @@ Then, for each discovered endpoint, read the surrounding source code to extract:
|
|||
- **Called from**: `LoginActivity → LoginViewModel → UserRepository → ApiService`
|
||||
```
|
||||
|
||||
As a default, do not produce Tier 2 entries for more than ~10 endpoints
|
||||
unless the user explicitly asks for more — Tier 1 plus a Tier 2 deep dive
|
||||
on auth + 1-2 key flows is what most consumers of this work actually want.
|
||||
|
||||
See `${CLAUDE_PLUGIN_ROOT}/skills/android-reverse-engineering/references/api-extraction-patterns.md` for library-specific search patterns and the full documentation template.
|
||||
|
||||
## Output
|
||||
|
|
|
|||
|
|
@ -55,6 +55,65 @@ grep -rn 'Interceptor\|addInterceptor\|addNetworkInterceptor\|intercept(' source
|
|||
grep -rn '\.execute()\|\.enqueue(' sources/
|
||||
```
|
||||
|
||||
## Ktor (Kotlin)
|
||||
|
||||
Ktor is the dominant HTTP client in Kotlin Multiplatform and modern
|
||||
Kotlin-only Android apps. Unlike Retrofit, Ktor does **not** use annotations
|
||||
to declare endpoints — paths appear as plain string arguments to
|
||||
`client.get(...)` / `client.post(...)`, often inside an extension function.
|
||||
|
||||
```bash
|
||||
# Calls
|
||||
grep -rn '\b\(client\|httpClient\|HttpClient\)\.\(get\|post\|put\|delete\|patch\|head\|request\)\s*[<(]' sources/
|
||||
|
||||
# Default request / base URL configuration
|
||||
grep -rn 'HttpRequestBuilder\|defaultRequest\s*{\|\burl\s*(\s*"\|URLBuilder' sources/
|
||||
|
||||
# Auth plugin (bearer / refresh)
|
||||
grep -rn '\bbearer\s*{\|BearerTokens\s*(\|loadTokens\s*{\|refreshTokens\s*{' sources/
|
||||
```
|
||||
|
||||
Typical Ktor call (after decompile):
|
||||
|
||||
```java
|
||||
client.get("api/v1/users/profile") {
|
||||
parameter("locale", "en-US");
|
||||
}
|
||||
```
|
||||
|
||||
The base URL is usually applied via `defaultRequest { url { host = "..." } }`
|
||||
in the client builder. Search for `host =` and `URLProtocol.HTTPS` references
|
||||
to pin it down.
|
||||
|
||||
**Note on obfuscation:** in heavily R8-shrunk apps the call site
|
||||
`client.get("path")` is inlined to something like `aVar.a(dVar, "path")`
|
||||
and the `client.<verb>(` regex misses it. The path string itself is **not**
|
||||
obfuscated, however — fall back to the generic path-literal search
|
||||
(`--paths`) for the endpoint inventory in those cases. Ktor library
|
||||
internals (`BearerTokens`, `loadTokens`, `refreshTokens`, `URLProtocol`)
|
||||
remain searchable because Ktor keeps these on its public API.
|
||||
|
||||
Ktor's authentication plugin uses the
|
||||
[`Auth { bearer { loadTokens { ... }; refreshTokens { ... } } }`](https://ktor.io/docs/auth.html)
|
||||
DSL — bearer access tokens with automatic refresh. After R8, the DSL
|
||||
lambdas appear as `Function2`/`Function3` impls referencing
|
||||
`BearerTokens(...)` calls.
|
||||
|
||||
## Apollo Kotlin (GraphQL)
|
||||
|
||||
```bash
|
||||
# Client setup
|
||||
grep -rn 'ApolloClient\|\.serverUrl(\|HttpNetworkTransport' sources/
|
||||
|
||||
# Operations (queries / mutations / subscriptions)
|
||||
grep -rn '\.query(\s*[A-Z]\|\.mutation(\s*[A-Z]\|\.subscription(\s*[A-Z]' sources/
|
||||
```
|
||||
|
||||
Apollo generates one class per operation under a generated package; once you
|
||||
find the GraphQL endpoint URL via `ApolloClient.serverUrl("...")`, use the
|
||||
operation classes themselves as the API documentation — each carries its
|
||||
GraphQL document text in `OPERATION_DOCUMENT`.
|
||||
|
||||
## Volley
|
||||
|
||||
```bash
|
||||
|
|
@ -77,6 +136,25 @@ grep -rn 'loadUrl\|evaluateJavascript\|addJavascriptInterface\|WebViewClient\|sh
|
|||
|
||||
WebView-based apps may load API endpoints via JavaScript bridges. Look for `@JavascriptInterface` annotated methods.
|
||||
|
||||
## Endpoint-Shaped Path Literals (obfuscation-resistant)
|
||||
|
||||
When the HTTP client cannot be identified (custom abstraction, heavy
|
||||
inlining, KMP shared module), or the call sites are obfuscated to
|
||||
`a.b(c, "path")`, fall back to extracting the path string literals
|
||||
themselves. R8 does not obfuscate string contents, so paths leak through.
|
||||
|
||||
```bash
|
||||
# All quoted strings shaped like an API path, deduplicated
|
||||
grep -rhoE '"(/[A-Za-z0-9_{}.\-]+(/[A-Za-z0-9_{}.\-]+)+/?|(api|v[0-9]+|graphql|users?|account|auth|sso|oauth|profile|cart|basket|order|product|inventory|search|category|address|location|delivery|payment|invoice|favo[u]?rites?)(/[A-Za-z0-9_{}.\-]+)+/?)"' sources/ \
|
||||
| grep -Ev '^"(image|video|audio|text|application|content)/|^"/(proc|sys|dev|tmp|etc)/' \
|
||||
| sort -u
|
||||
```
|
||||
|
||||
The skill ships this as `find-api-calls.sh --paths`, which prints both a
|
||||
deduplicated inventory and the full list of call sites. On real-world
|
||||
Kotlin apps this single command typically produces 100–300 distinct
|
||||
endpoint paths, which is the most useful first artifact for documentation.
|
||||
|
||||
## Hardcoded URLs and Secrets
|
||||
|
||||
```bash
|
||||
|
|
|
|||
|
|
@ -84,9 +84,9 @@ Look for:
|
|||
- Firebase/analytics initialization
|
||||
- Base URL configuration
|
||||
|
||||
## 5. Dependency Injection (Dagger / Hilt)
|
||||
## 5. Dependency Injection
|
||||
|
||||
Modern Android apps use DI. Trace bindings to find implementations:
|
||||
### Dagger / Hilt
|
||||
|
||||
```bash
|
||||
# Hilt modules
|
||||
|
|
@ -102,10 +102,43 @@ grep -rn '@Component\|@Subcomponent' sources/
|
|||
grep -rn '@Inject' sources/
|
||||
```
|
||||
|
||||
To trace a call flow through DI:
|
||||
1. Find where an interface is used (e.g., `ApiService` injected into a repository)
|
||||
2. Find the `@Provides` or `@Binds` method that creates the implementation
|
||||
3. Follow the implementation to the actual HTTP call
|
||||
### Koin
|
||||
|
||||
Koin is the dominant DI framework in Kotlin Multiplatform and a large
|
||||
share of Kotlin-only Android apps. It uses a runtime DSL rather than
|
||||
compile-time generated factories, so the search patterns are different:
|
||||
|
||||
```bash
|
||||
# Confirm Koin is actually wired up
|
||||
grep -rn 'org\.koin\.' sources/
|
||||
|
||||
# DI module declarations
|
||||
grep -rn 'fun [A-Za-z]\+Module\|module\s*{\|module(' sources/
|
||||
|
||||
# Bindings inside a module DSL
|
||||
grep -rn 'single\s*[<{(]\|factory\s*[<{(]\|viewModel\s*[<{(]\|scoped\s*[<{(]\|singleOf\|factoryOf' sources/
|
||||
|
||||
# Resolution call-sites (where a binding is consumed)
|
||||
grep -rn '\bget\s*<\|\binject\s*<\|by\s\+inject\b\|by\s\+viewModel\b\|getKoin' sources/
|
||||
```
|
||||
|
||||
After R8, every binding lambda becomes an anonymous
|
||||
`Function2<Scope, ParametersHolder, T>` impl. To find the binding for an
|
||||
interface `Foo`, look for files that contain both a Koin import / module
|
||||
DSL marker and a reference to `Foo`:
|
||||
|
||||
```bash
|
||||
grep -rln 'org\.koin\.core\.module' sources/ | xargs grep -l 'Foo'
|
||||
```
|
||||
|
||||
### Trace through DI
|
||||
|
||||
1. Find where an interface is used (e.g. `ApiService` injected into a
|
||||
repository).
|
||||
2. Find the `@Provides` / `@Binds` method (Hilt) **or** the
|
||||
`single { ... }` / `factory { ... }` block (Koin) that creates the
|
||||
implementation.
|
||||
3. Follow the implementation to the actual HTTP call.
|
||||
|
||||
## 6. Find Constants and Configuration
|
||||
|
||||
|
|
@ -145,8 +178,9 @@ When code is obfuscated (ProGuard/R8):
|
|||
1. **Start from strings**: Search for URLs, error messages, and known constants
|
||||
2. **Start from framework classes**: Activities and Fragments are named in the manifest
|
||||
3. **Follow library calls**: Retrofit `@GET`/`@POST` annotations are readable even when the interface class name is obfuscated
|
||||
4. **Use `--deobf`**: jadx can generate readable replacement names
|
||||
4. **Recover original Kotlin names from metadata**: `@DebugMetadata` and `@Metadata.d2` strings preserve the original FQNs even after R8 obfuscation. Run `scripts/recover-kotlin-names.sh` to build an `obf -> real` map (typically recovers 30-50% of classes — and almost 100% of `*Repository` / `*ViewModel` / `*Impl`). See [`kotlin-name-recovery.md`](./kotlin-name-recovery.md). This is the single highest-leverage step on any Kotlin app.
|
||||
5. **Cross-reference**: If `class a` calls `Retrofit.create(b.class)`, then `b` is a Retrofit service interface
|
||||
6. **`--deobf` is rarely enough on its own**: jadx's `--deobf` renames obfuscated symbols with synthetic placeholders (`p001a`, `C0123Foo`) — useful for disambiguation but it does **not** recover original names. Pair it with the metadata recovery above.
|
||||
|
||||
## 8. Tracing a Complete Call Flow: Example
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,108 @@
|
|||
# Recovering Original Class Names from Kotlin Metadata
|
||||
|
||||
When R8/ProGuard obfuscates a Kotlin app, JVM symbols are renamed but the
|
||||
**Kotlin metadata strings cannot be stripped** — the Kotlin runtime depends
|
||||
on them at runtime for reflection, coroutines, and `data class` features.
|
||||
|
||||
Two annotations leak the original fully-qualified names:
|
||||
|
||||
## `@DebugMetadata`
|
||||
|
||||
Generated for nearly every Kotlin coroutine `SuspendLambda` (i.e. almost
|
||||
every `suspend` function in a modern app):
|
||||
|
||||
```java
|
||||
@DebugMetadata(
|
||||
c = "com.example.feature.account.AccountRepositoryImpl$fetch$1",
|
||||
f = "AccountRepositoryImpl.kt",
|
||||
l = {42, 51},
|
||||
m = "invokeSuspend"
|
||||
)
|
||||
public final class a extends SuspendLambda implements Function2<...> { ... }
|
||||
```
|
||||
|
||||
The `c =` field carries the original outer class FQN (with a `$` suffix
|
||||
for inner / lambda scopes — strip everything after the first `$` to get the
|
||||
declaring class).
|
||||
|
||||
## `@Metadata.d2`
|
||||
|
||||
Every Kotlin class carries a top-level `@Metadata` annotation. The `d2`
|
||||
array lists internal class refs in JVM type-descriptor format
|
||||
(`Lcom/example/Foo;`):
|
||||
|
||||
```java
|
||||
@Metadata(d1 = {"..."},
|
||||
d2 = {"...","Lcom/example/feature/account/AccountRepositoryImpl;","..."})
|
||||
public final class b implements ... { ... }
|
||||
```
|
||||
|
||||
The first non-stdlib descriptor in `d2` is usually the file's primary
|
||||
class.
|
||||
|
||||
## How to mine them
|
||||
|
||||
The skill ships two scripts:
|
||||
|
||||
```bash
|
||||
# Build a mapping from a decompiled sources directory:
|
||||
bash scripts/recover-kotlin-names.sh <output>/sources [mapping-dir]
|
||||
|
||||
# Outputs:
|
||||
# <mapping-dir>/mapping.tsv obf_fqn real_fqn file
|
||||
# <mapping-dir>/mapping.json same data, JSON
|
||||
# <mapping-dir>/by_package/ per-real-package index files
|
||||
|
||||
# Query the mapping:
|
||||
bash scripts/lookup-name.sh <mapping-dir> Repository # search
|
||||
bash scripts/lookup-name.sh <mapping-dir> -o ab.cd # obf -> real
|
||||
bash scripts/lookup-name.sh <mapping-dir> -p com.example.feature # list package
|
||||
bash scripts/lookup-name.sh <mapping-dir> --grep '"api/' <output>/sources
|
||||
# ^ greps decompiled code and appends '// real.fqn' to each hit
|
||||
```
|
||||
|
||||
## What you typically recover
|
||||
|
||||
On a real-world obfuscated Kotlin app the script recovers **30 – 50 % of
|
||||
classes** — but more importantly, **almost 100 % of the classes you
|
||||
actually want to read**:
|
||||
|
||||
| Class kind | Recovery rate |
|
||||
|---------------------------|---------------|
|
||||
| `*Repository` / `*Impl` | ~100 % |
|
||||
| `*ViewModel` | ~100 % |
|
||||
| `*UseCase` / `*Interactor`| ~100 % |
|
||||
| Plain `data class` DTOs | ~80 % |
|
||||
| Pure-Java helper classes | low (no Kotlin metadata) |
|
||||
| Anonymous inner classes | sometimes recovered as the parent FQN |
|
||||
|
||||
## Why `jadx --deobf` is not enough
|
||||
|
||||
`--deobf` renames obfuscated identifiers using internal heuristics, but the
|
||||
output is still synthetic (`p001a`, `C0123Foo`). It does **not** recover
|
||||
the *original* names. Kotlin metadata recovery is the only reliable way to
|
||||
map back to the names the developer actually wrote, and it costs essentially
|
||||
nothing — just a regex pass over the decompiled sources.
|
||||
|
||||
Run both: `--deobf` for fields/methods that have no metadata source, plus
|
||||
the recovery script for class names.
|
||||
|
||||
## Limitations
|
||||
|
||||
- **Method names and field names** are not recovered. Kotlin metadata only
|
||||
preserves class-level FQNs and a few signatures. For method names you
|
||||
still need jadx-gui's interactive rename or pattern inference.
|
||||
- **Pure-Java classes** carry no `@Metadata`, so they remain obfuscated.
|
||||
- **Heavily inlined classes** (`@JvmInline value class`, top-level fun
|
||||
files compiled into shared `*Kt.class` synthetic classes) sometimes show
|
||||
up under the wrong filename — treat results as a strong hint, not gospel.
|
||||
|
||||
## Reading flow with the mapping
|
||||
|
||||
1. Run `recover-kotlin-names.sh` once after decompiling.
|
||||
2. Use `lookup-name.sh --grep '<pattern>' <sources>` instead of plain `grep`
|
||||
so every hit comes annotated with the real owning class.
|
||||
3. When you hit an obfuscated FQN in code (e.g. `nq.e`), resolve it with
|
||||
`lookup-name.sh <mapping-dir> -o nq.e` — you will often see siblings
|
||||
(`nq.d`, `nq.f`, ...) that are the same class's split lambdas/inner
|
||||
classes, which is useful context.
|
||||
|
|
@ -0,0 +1,122 @@
|
|||
# Third-party host denylist used by find-api-calls.sh --urls.
|
||||
#
|
||||
# Patterns are extended-regex hostname suffixes / fragments. A host is
|
||||
# considered "third-party noise" if any pattern below matches anywhere
|
||||
# in the hostname. Lines starting with '#' and blank lines are ignored.
|
||||
#
|
||||
# This list is intentionally conservative: when a pattern would hide a
|
||||
# legitimate first-party host (e.g. an app may run its own *.s3.amazonaws.com
|
||||
# bucket), keep the pattern but expect manual review of the bucketed output.
|
||||
|
||||
# Google / Firebase / Play / Crashlytics
|
||||
\.googleapis\.com$
|
||||
\.google\.com$
|
||||
\.gstatic\.com$
|
||||
\.googleusercontent\.com$
|
||||
\.googletagmanager\.com$
|
||||
\.googlesyndication\.com$
|
||||
\.firebaseio\.com$
|
||||
\.firebaseapp\.com$
|
||||
\.firebaseinstallations\.googleapis\.com$
|
||||
\.firebaseremoteconfig\.googleapis\.com$
|
||||
\.crashlytics\.com$
|
||||
\.app-measurement\.com$
|
||||
|
||||
# Apple / Microsoft / Adobe
|
||||
\.apple\.com$
|
||||
\.icloud\.com$
|
||||
\.microsoft\.com$
|
||||
\.live\.com$
|
||||
\.office\.com$
|
||||
\.adobe\.com$
|
||||
ns\.adobe\.com
|
||||
|
||||
# Meta
|
||||
\.facebook\.com$
|
||||
\.fbcdn\.net$
|
||||
\.instagram\.com$
|
||||
\.whatsapp\.com$
|
||||
|
||||
# Other social / messaging / video
|
||||
\.twitter\.com$
|
||||
\.x\.com$
|
||||
\.tiktok\.com$
|
||||
\.youtube\.com$
|
||||
\.youtu\.be$
|
||||
\.linkedin\.com$
|
||||
\.snapchat\.com$
|
||||
\.pinterest\.com$
|
||||
\.reddit\.com$
|
||||
|
||||
# Mobile attribution / analytics / observability
|
||||
\.appsflyersdk\.com$
|
||||
\.appsflyer\.com$
|
||||
\.adjust\.com$
|
||||
\.branch\.io$
|
||||
\.amplitude\.com$
|
||||
\.segment\.com$
|
||||
\.mixpanel\.com$
|
||||
\.hotjar\.com$
|
||||
\.clarity\.ms$
|
||||
\.datadoghq\.(com|eu|us)$
|
||||
\.sentry\.io$
|
||||
\.bugsnag\.com$
|
||||
\.newrelic\.com$
|
||||
\.instabug\.com$
|
||||
\.embrace\.io$
|
||||
\.rollout\.io$
|
||||
\.launchdarkly\.com$
|
||||
|
||||
# Push / notifications
|
||||
\.onesignal\.com$
|
||||
\.urbanairship\.com$
|
||||
\.airship\.com$
|
||||
|
||||
# Support / chat
|
||||
\.zendesk\.com$
|
||||
\.intercom\.io$
|
||||
\.intercomcdn\.com$
|
||||
\.helpshift\.com$
|
||||
\.salesforce\.com$
|
||||
\.freshchat\.com$
|
||||
\.kustomerapp\.com$
|
||||
|
||||
# Payments
|
||||
\.stripe\.com$
|
||||
\.braintreepayments\.com$
|
||||
\.braintreegateway\.com$
|
||||
\.payu\.com$
|
||||
\.payu\.in$
|
||||
\.paypal\.com$
|
||||
\.adyen\.com$
|
||||
\.checkout\.com$
|
||||
\.klarna\.com$
|
||||
|
||||
# Maps / location
|
||||
\.mapbox\.com$
|
||||
\.openstreetmap\.org$
|
||||
|
||||
# Storage / CDN (often third-party even when the bucket name is app-specific)
|
||||
\.s3\.amazonaws\.com$
|
||||
\.cloudfront\.net$
|
||||
\.akamaihd\.net$
|
||||
\.akamaized\.net$
|
||||
\.fastly\.net$
|
||||
\.cloudflare\.com$
|
||||
\.azureedge\.net$
|
||||
|
||||
# DNS / well-known infra
|
||||
\.localhost$
|
||||
^localhost
|
||||
^127\.
|
||||
|
||||
# Standards / RFCs / placeholders that show up as XML/XMP namespaces
|
||||
\.w3\.org$
|
||||
\.w3c\.org$
|
||||
example\.(com|org|net)$
|
||||
|
||||
# Certificate authorities
|
||||
\.sectigo\.com$
|
||||
\.entrust\.com$
|
||||
\.digicert\.com$
|
||||
\.letsencrypt\.org$
|
||||
|
|
@ -14,8 +14,12 @@ Arguments:
|
|||
Options:
|
||||
--retrofit Search only for Retrofit annotations
|
||||
--okhttp Search only for OkHttp patterns
|
||||
--ktor Search only for Ktor client patterns
|
||||
--apollo Search only for Apollo (GraphQL) patterns
|
||||
--volley Search only for Volley patterns
|
||||
--urls Search only for hardcoded URLs
|
||||
--paths Extract unique endpoint-shaped path string literals
|
||||
(works on heavily obfuscated apps where call sites are inlined)
|
||||
--auth Search only for auth-related patterns
|
||||
--all Search all patterns (default)
|
||||
-h, --help Show this help message
|
||||
|
|
@ -29,8 +33,11 @@ EOF
|
|||
SOURCE_DIR=""
|
||||
SEARCH_RETROFIT=false
|
||||
SEARCH_OKHTTP=false
|
||||
SEARCH_KTOR=false
|
||||
SEARCH_APOLLO=false
|
||||
SEARCH_VOLLEY=false
|
||||
SEARCH_URLS=false
|
||||
SEARCH_PATHS=false
|
||||
SEARCH_AUTH=false
|
||||
SEARCH_ALL=true
|
||||
|
||||
|
|
@ -38,8 +45,11 @@ while [[ $# -gt 0 ]]; do
|
|||
case "$1" in
|
||||
--retrofit) SEARCH_RETROFIT=true; SEARCH_ALL=false; shift ;;
|
||||
--okhttp) SEARCH_OKHTTP=true; SEARCH_ALL=false; shift ;;
|
||||
--ktor) SEARCH_KTOR=true; SEARCH_ALL=false; shift ;;
|
||||
--apollo) SEARCH_APOLLO=true; SEARCH_ALL=false; shift ;;
|
||||
--volley) SEARCH_VOLLEY=true; SEARCH_ALL=false; shift ;;
|
||||
--urls) SEARCH_URLS=true; SEARCH_ALL=false; shift ;;
|
||||
--paths) SEARCH_PATHS=true; SEARCH_ALL=false; shift ;;
|
||||
--auth) SEARCH_AUTH=true; SEARCH_ALL=false; shift ;;
|
||||
--all) SEARCH_ALL=true; shift ;;
|
||||
-h|--help) usage ;;
|
||||
|
|
@ -72,6 +82,58 @@ run_grep() {
|
|||
grep $GREP_OPTS -E "$pattern" "$SOURCE_DIR" 2>/dev/null || true
|
||||
}
|
||||
|
||||
# Print a one-screen summary FIRST so a reader knows what to expect from
|
||||
# the long output that follows. Skipped when a single section flag was
|
||||
# requested (the user wants raw matches, not an overview). One pass over
|
||||
# the tree, counts bucketed by tag — running 8 separate greps was too slow.
|
||||
if [[ "$SEARCH_ALL" == true ]]; then
|
||||
section "Summary (counted in a single pass)"
|
||||
declare -A H=(
|
||||
[retrofit]=0 [okhttp]=0 [ktor]=0 [apollo]=0 [volley]=0
|
||||
[hilt]=0 [koin]=0 [bearer]=0 [hmac]=0
|
||||
)
|
||||
while IFS= read -r line; do
|
||||
case "$line" in
|
||||
*"@GET("*|*"@POST("*|*"@PUT("*|*"@DELETE("*|*"@PATCH("*|*"@HTTP("*) H[retrofit]=$((H[retrofit]+1));;
|
||||
esac
|
||||
case "$line" in
|
||||
*"Request.Builder"*|*"HttpUrl"*|*".newCall("*) H[okhttp]=$((H[okhttp]+1));;
|
||||
esac
|
||||
case "$line" in
|
||||
*"BearerTokens"*|*"defaultRequest {"*|*"client.get("*|*"client.post("*|*"httpClient.get("*|*"httpClient.post("*|*"HttpClient.get("*) H[ktor]=$((H[ktor]+1));;
|
||||
esac
|
||||
case "$line" in
|
||||
*"ApolloClient"*|*".serverUrl("*) H[apollo]=$((H[apollo]+1));;
|
||||
esac
|
||||
case "$line" in
|
||||
*"StringRequest"*|*"JsonObjectRequest"*|*"RequestQueue"*) H[volley]=$((H[volley]+1));;
|
||||
esac
|
||||
case "$line" in
|
||||
*"@HiltAndroidApp"*|*"@AndroidEntryPoint"*|*"@HiltViewModel"*|*"@Provides"*|*"@Binds"*) H[hilt]=$((H[hilt]+1));;
|
||||
esac
|
||||
case "$line" in
|
||||
*"org.koin."*|*"module {"*|*"single<"*|*"factory<"*|*"singleOf("*|*"factoryOf("*) H[koin]=$((H[koin]+1));;
|
||||
esac
|
||||
case "$line" in
|
||||
*'"Bearer '*|*'"bearer '*|*"BearerTokens"*) H[bearer]=$((H[bearer]+1));;
|
||||
esac
|
||||
case "$line" in
|
||||
*"HmacSHA"*|*'Mac.getInstance("Hmac'*) H[hmac]=$((H[hmac]+1));;
|
||||
esac
|
||||
done < <(grep -rEh --include='*.java' --include='*.kt' \
|
||||
'@(GET|POST|PUT|DELETE|PATCH|HTTP)\(|Request\.Builder|HttpUrl|\.newCall\(|BearerTokens|defaultRequest \{|client\.(get|post)\(|httpClient\.(get|post)\(|ApolloClient|\.serverUrl\(|StringRequest|JsonObjectRequest|RequestQueue|@HiltAndroidApp|@AndroidEntryPoint|@HiltViewModel|@Provides|@Binds|org\.koin\.|module \{|single<|factory<|"[Bb]earer |HmacSHA|Mac\.getInstance' \
|
||||
"$SOURCE_DIR" 2>/dev/null || true)
|
||||
printf ' HTTP framework: Retrofit=%-5s OkHttp=%-5s Ktor=%-5s Apollo=%-5s Volley=%-5s\n' \
|
||||
"${H[retrofit]}" "${H[okhttp]}" "${H[ktor]}" "${H[apollo]}" "${H[volley]}"
|
||||
printf ' DI framework: Hilt/Dagger=%-5s Koin=%-5s\n' \
|
||||
"${H[hilt]}" "${H[koin]}"
|
||||
printf ' Auth signals: Bearer=%-5s HMAC/Sign=%-5s\n' \
|
||||
"${H[bearer]}" "${H[hmac]}"
|
||||
echo
|
||||
echo " Run with one of --retrofit / --okhttp / --ktor / --apollo / --volley /"
|
||||
echo " --paths / --urls / --auth to inspect a single section."
|
||||
fi
|
||||
|
||||
# --- Retrofit ---
|
||||
if [[ "$SEARCH_ALL" == true || "$SEARCH_RETROFIT" == true ]]; then
|
||||
section "Retrofit Annotations"
|
||||
|
|
@ -90,16 +152,123 @@ if [[ "$SEARCH_ALL" == true || "$SEARCH_OKHTTP" == true ]]; then
|
|||
run_grep '(\.url\s*\(|\.addQueryParameter|\.addPathSegment|\.scheme\s*\(|\.host\s*\()'
|
||||
fi
|
||||
|
||||
# --- Ktor (Kotlin) ---
|
||||
# Ktor doesn't use annotations. Endpoints appear as string args to
|
||||
# client.get/post/etc., or are built via HttpRequestBuilder.url(...). Auth
|
||||
# is configured via the bearer { loadTokens / refreshTokens } DSL.
|
||||
if [[ "$SEARCH_ALL" == true || "$SEARCH_KTOR" == true ]]; then
|
||||
section "Ktor — Client Calls"
|
||||
run_grep '\b(client|httpClient|HttpClient)\.(get|post|put|delete|patch|head|request)\s*[<(]'
|
||||
section "Ktor — Request Building / Default Request"
|
||||
run_grep '(HttpRequestBuilder|defaultRequest\s*\{|\burl\s*\(\s*"|URLBuilder|URLProtocol)'
|
||||
section "Ktor — Auth Plugin (Bearer / Refresh)"
|
||||
run_grep '(\bbearer\s*\{|BearerTokens\s*\(|loadTokens\s*\{|refreshTokens\s*\{|\bAuth\s*\)\s*\{)'
|
||||
fi
|
||||
|
||||
# --- Apollo (GraphQL) ---
|
||||
if [[ "$SEARCH_ALL" == true || "$SEARCH_APOLLO" == true ]]; then
|
||||
section "Apollo — GraphQL Client"
|
||||
run_grep '(ApolloClient|\.serverUrl\s*\(|\.subscriptionNetworkTransport|HttpNetworkTransport)'
|
||||
section "Apollo — Operations"
|
||||
run_grep '(\.query\s*\(\s*[A-Z]|\.mutation\s*\(\s*[A-Z]|\.subscription\s*\(\s*[A-Z])'
|
||||
fi
|
||||
|
||||
# --- Volley ---
|
||||
if [[ "$SEARCH_ALL" == true || "$SEARCH_VOLLEY" == true ]]; then
|
||||
section "Volley Requests"
|
||||
run_grep '(StringRequest|JsonObjectRequest|JsonArrayRequest|ImageRequest|RequestQueue|Volley\.newRequestQueue)'
|
||||
fi
|
||||
|
||||
# --- Endpoint-shaped path literals ---
|
||||
# Survives R8 obfuscation: even when call sites are inlined to a.b(c, "path"),
|
||||
# the path strings themselves are not obfuscated. This produces a deduplicated
|
||||
# inventory of likely API endpoints that other modes miss.
|
||||
if [[ "$SEARCH_ALL" == true || "$SEARCH_PATHS" == true ]]; then
|
||||
section "Endpoint-Shaped Path Literals (deduplicated)"
|
||||
# Quoted strings that begin with /<segment> or <segment>/ where the leading
|
||||
# segment is a typical API root word. Cap segment count and length to keep
|
||||
# the regex grounded.
|
||||
# An endpoint-shaped string is one of:
|
||||
# "/seg/seg..." — absolute path with >= 2 segments
|
||||
# "api-root/seg/seg..." — relative path starting with a known
|
||||
# API root keyword and containing >= 1
|
||||
# '/' followed by another segment
|
||||
# Segments are URL-safe chars plus {} for path-template placeholders.
|
||||
SEG='[A-Za-z0-9_{}.\-]+'
|
||||
ROOT='(api|v[0-9]+|graphql|rest|mobile|auth|oauth|sso|users?|account|session|token|register|signup|signin|logout|password|verify|otp|sms|profile|customer|cart|basket|order|checkout|payment|invoice|product|catalog|inventory|search|category|favo[u]?rites?|wishlist|address|location|delivery|shipping|review|feedback|notification|push|message|chat|track|event|stat[a-z]*|metric|config|settings?|feature|flag|banner|content|media|upload|download|file|image|video|live|stream|webhook|callback)'
|
||||
PATHS_REGEX="\"(/${SEG}(/${SEG})+/?|${ROOT}(/${SEG})+/?)\""
|
||||
# Filter out frequent false positives (MIME types, /proc, /sys, /dev).
|
||||
EXCLUDE='^"(image|video|audio|text|application|content|font|model|multipart|message)/|^"/(proc|sys|dev|tmp|etc|usr|var|opt)/'
|
||||
# Print a flat unique list rather than file:line — this is the inventory.
|
||||
grep -rhoE --include='*.java' --include='*.kt' "$PATHS_REGEX" "$SOURCE_DIR" 2>/dev/null \
|
||||
| grep -Ev "$EXCLUDE" \
|
||||
| sort -u
|
||||
echo
|
||||
section "Endpoint-Shaped Path Literals — call sites"
|
||||
grep $GREP_OPTS -E "$PATHS_REGEX" "$SOURCE_DIR" 2>/dev/null \
|
||||
| grep -Ev ":[0-9]+:.*${EXCLUDE#^}" || true
|
||||
fi
|
||||
|
||||
# --- Hardcoded URLs ---
|
||||
# A loose grep for http(s)://... drowns in compression-dictionary garbage and
|
||||
# in third-party SDK URLs (Google, Firebase, AppsFlyer, Datadog, ...). The
|
||||
# strict regex requires a syntactically valid hostname and rejects strings
|
||||
# containing whitespace, angle brackets, or non-printable bytes. Hosts are
|
||||
# then bucketed into "first-party candidates" vs "third-party (denylist)".
|
||||
if [[ "$SEARCH_ALL" == true || "$SEARCH_URLS" == true ]]; then
|
||||
section "Hardcoded URLs (http:// and https://)"
|
||||
run_grep '"https?://[^"]+'
|
||||
HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
DENYLIST="$HERE/../references/third_party_hosts.txt"
|
||||
# Hostname must have at least one dot and end in a 2+ letter TLD.
|
||||
STRICT_URL='https?://[A-Za-z0-9-]+(\.[A-Za-z0-9-]+)+\.[A-Za-z]{2,}(:[0-9]{1,5})?(/[^"<>[:space:]]*)?'
|
||||
|
||||
TMP="$(mktemp)"
|
||||
trap 'rm -f "$TMP"' EXIT
|
||||
grep -rhoE --include='*.java' --include='*.kt' "$STRICT_URL" "$SOURCE_DIR" 2>/dev/null \
|
||||
| sort -u > "$TMP"
|
||||
|
||||
# Extract host: strip scheme, take part up to first ':' or '/'.
|
||||
HOSTS_TMP="$(mktemp)"
|
||||
sed -E 's#^https?://##; s#[/:].*$##' "$TMP" | sort -u > "$HOSTS_TMP"
|
||||
|
||||
if [[ -f "$DENYLIST" ]]; then
|
||||
# Build a single combined regex from the denylist (one line each).
|
||||
DENY_REGEX="$(grep -vE '^\s*(#|$)' "$DENYLIST" | tr '\n' '|' | sed 's/|$//')"
|
||||
THIRD_HOSTS=$(grep -E "$DENY_REGEX" "$HOSTS_TMP" || true)
|
||||
FIRST_HOSTS=$(grep -vE "$DENY_REGEX" "$HOSTS_TMP" || true)
|
||||
else
|
||||
THIRD_HOSTS=""
|
||||
FIRST_HOSTS=$(cat "$HOSTS_TMP")
|
||||
fi
|
||||
|
||||
section "Likely First-Party Hosts (frequency-sorted)"
|
||||
if [[ -n "$FIRST_HOSTS" ]]; then
|
||||
while IFS= read -r h; do
|
||||
[[ -z "$h" ]] && continue
|
||||
n=$(grep -cE "://${h//./\\.}([/:\"]|$)" "$TMP" || true)
|
||||
printf ' %5d %s\n' "$n" "$h"
|
||||
done <<< "$FIRST_HOSTS" | sort -rn -k1
|
||||
else
|
||||
echo " (none — every URL matched the third-party denylist)"
|
||||
fi
|
||||
|
||||
section "Third-Party Hosts (denylist matches, collapsed)"
|
||||
if [[ -n "$THIRD_HOSTS" ]]; then
|
||||
echo "$THIRD_HOSTS" | sed 's/^/ /'
|
||||
else
|
||||
echo " (none)"
|
||||
fi
|
||||
|
||||
section "All First-Party URLs (full strings)"
|
||||
if [[ -n "$FIRST_HOSTS" ]]; then
|
||||
while IFS= read -r h; do
|
||||
[[ -z "$h" ]] && continue
|
||||
grep -E "://${h//./\\.}([/:\"]|$)" "$TMP" | sed 's/^/ /'
|
||||
done <<< "$FIRST_HOSTS"
|
||||
fi
|
||||
|
||||
rm -f "$HOSTS_TMP" "$TMP"
|
||||
trap - EXIT
|
||||
|
||||
section "HttpURLConnection"
|
||||
run_grep '(openConnection|setRequestMethod|HttpURLConnection|HttpsURLConnection)'
|
||||
section "WebView URLs"
|
||||
|
|
@ -109,9 +278,27 @@ fi
|
|||
# --- Auth patterns ---
|
||||
if [[ "$SEARCH_ALL" == true || "$SEARCH_AUTH" == true ]]; then
|
||||
section "Authentication & API Keys"
|
||||
run_grep -i '(api[_-]?key|auth[_-]?token|bearer|authorization|x-api-key|client[_-]?secret|access[_-]?token)'
|
||||
run_grep -i '(api[_-]?key|auth[_-]?token|bearer|authorization|x-api-key|client[_-]?secret|access[_-]?token|refresh[_-]?token)'
|
||||
|
||||
# Request-signing schemes: a hardcoded HMAC / RSA secret in an APK is a
|
||||
# security finding worth surfacing prominently. These patterns catch the
|
||||
# common shapes of homegrown / SDK-issued request signers.
|
||||
section "Request Signing (HMAC / signature schemes)"
|
||||
run_grep '(HmacSHA(1|256|512)|Mac\.getInstance\("Hmac|SecretKeySpec\(|Signature\.getInstance\()'
|
||||
run_grep -i '(x-signature|x-client-authorization|x-amz-signature|x-hmac|aws4-hmac|signRequest|signatureFor|computeSignature|signaturev[0-9])'
|
||||
|
||||
# Hardcoded high-entropy strings adjacent to "secret"/"key" assignments
|
||||
# are the canonical leaked-credential pattern.
|
||||
section "Possible Hardcoded Secrets / Keys"
|
||||
run_grep -i '(app[_-]?secret|client[_-]?secret|signing[_-]?key|hmac[_-]?secret|consumer[_-]?secret|private[_-]?key)'
|
||||
|
||||
section "Base URLs and Constants"
|
||||
run_grep -i '(BASE_URL|API_URL|SERVER_URL|ENDPOINT|API_BASE|HOST_NAME)'
|
||||
|
||||
# Ktor BearerTokens / refresh DSL — common on Kotlin apps and lives on
|
||||
# Ktor's public API, so it survives R8 unchanged.
|
||||
section "Ktor Auth (Bearer + Refresh)"
|
||||
run_grep '(BearerTokens|loadTokens\s*\{|refreshTokens\s*\{|\bbearer\s*\{)'
|
||||
fi
|
||||
|
||||
echo
|
||||
|
|
|
|||
|
|
@ -0,0 +1,241 @@
|
|||
#!/usr/bin/env bash
|
||||
# fingerprint.sh — Triage an APK/XAPK before decompiling.
|
||||
#
|
||||
# Detects mobile framework (Flutter, React Native, Cordova/Capacitor,
|
||||
# Xamarin, KMP/native), HTTP-stack hints, obfuscation level, native libs,
|
||||
# and notable third-party SDKs.
|
||||
#
|
||||
# Decompiling Java is mostly useless for Flutter / RN / Xamarin / Cordova
|
||||
# apps — different tools are needed. Run this BEFORE Phase 2 to choose
|
||||
# the right path.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
usage() {
|
||||
cat <<EOF
|
||||
Usage: fingerprint.sh <file.apk|file.xapk>
|
||||
|
||||
Prints a one-screen summary:
|
||||
* mobile framework (with rationale)
|
||||
* HTTP / DI / serialization stack hints
|
||||
* obfuscation indicator
|
||||
* native libraries (consolidated across split APKs)
|
||||
* notable third-party SDKs found in assets/
|
||||
EOF
|
||||
exit 0
|
||||
}
|
||||
|
||||
[[ $# -lt 1 || "$1" == "-h" || "$1" == "--help" ]] && usage
|
||||
INPUT="$1"
|
||||
[[ ! -f "$INPUT" ]] && { echo "File not found: $INPUT" >&2; exit 1; }
|
||||
|
||||
TMP="$(mktemp -d -t apkfp.XXXXXX)"
|
||||
trap 'rm -rf "$TMP"' EXIT
|
||||
|
||||
# Resolve to a list of APKs (handle XAPK = ZIP of APKs)
|
||||
APKS=()
|
||||
case "${INPUT,,}" in
|
||||
*.xapk|*.apks|*.apkm)
|
||||
unzip -q -o "$INPUT" -d "$TMP/xapk"
|
||||
while IFS= read -r p; do APKS+=("$p"); done < <(find "$TMP/xapk" -maxdepth 2 -type f -name '*.apk')
|
||||
;;
|
||||
*.apk)
|
||||
APKS=("$INPUT")
|
||||
;;
|
||||
*)
|
||||
echo "Unsupported input: $INPUT" >&2; exit 1 ;;
|
||||
esac
|
||||
|
||||
# Aggregate ZIP listings from every APK in the bundle (split-aware view)
|
||||
LISTING="$TMP/listing.txt"
|
||||
: > "$LISTING"
|
||||
for apk in "${APKS[@]}"; do
|
||||
unzip -l -- "$apk" 2>/dev/null | awk '{print $NF}' >> "$LISTING"
|
||||
done
|
||||
|
||||
# Most class-level libs live inside classes*.dex, not as visible zip paths.
|
||||
# Extract the type-name strings out of each dex with `strings` and append them
|
||||
# to the listing so `has()` can match e.g. 'io/ktor/' or 'org/koin/'.
|
||||
DEX_STRINGS="$TMP/dex_strings.txt"
|
||||
: > "$DEX_STRINGS"
|
||||
for apk in "${APKS[@]}"; do
|
||||
for dex in $(unzip -Z1 -- "$apk" 2>/dev/null | grep -E '^classes[0-9]*\.dex$' || true); do
|
||||
# DEX type descriptors look like "Lcom/foo/Bar;". Extract the inner
|
||||
# slash-separated FQN so callers can match e.g. 'io/ktor/' directly.
|
||||
unzip -p -- "$apk" "$dex" 2>/dev/null \
|
||||
| strings -n 8 \
|
||||
| grep -oE 'L[a-z][a-zA-Z0-9_]*(/[a-zA-Z0-9_$]+)+;' \
|
||||
| sed -E 's/^L//; s/;$//' \
|
||||
>> "$DEX_STRINGS" || true
|
||||
done
|
||||
done
|
||||
sort -u "$DEX_STRINGS" -o "$DEX_STRINGS"
|
||||
|
||||
has() { grep -qE "$1" "$LISTING" || grep -qE "$1" "$DEX_STRINGS"; }
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# Framework detection (priority order — first match wins)
|
||||
# ----------------------------------------------------------------------
|
||||
FRAMEWORK="unknown"
|
||||
RATIONALE=""
|
||||
|
||||
if has '^lib/[^/]+/libflutter\.so$'; then
|
||||
FRAMEWORK="Flutter"
|
||||
RATIONALE="lib/<abi>/libflutter.so present"
|
||||
has '^lib/[^/]+/libapp\.so$' && RATIONALE+="; libapp.so contains AOT-compiled Dart"
|
||||
elif has '^lib/[^/]+/libhermes\.so$' || has '^assets/index\.android\.bundle$' || has '^lib/[^/]+/libreactnativejni\.so$'; then
|
||||
FRAMEWORK="React Native"
|
||||
reasons=()
|
||||
has '^lib/[^/]+/libhermes\.so$' && reasons+=("libhermes.so")
|
||||
has '^lib/[^/]+/libreactnativejni\.so$' && reasons+=("libreactnativejni.so")
|
||||
has '^assets/index\.android\.bundle$' && reasons+=("assets/index.android.bundle")
|
||||
RATIONALE="${reasons[*]}"
|
||||
elif has '^assets/www/index\.html$' || has '^assets/www/cordova\.js$' || has '^assets/public/index\.html$'; then
|
||||
FRAMEWORK="Cordova / Capacitor (WebView hybrid)"
|
||||
RATIONALE="assets/www/ or assets/public/ shell present"
|
||||
elif has '^lib/[^/]+/libmonodroid\.so$' || has '^assemblies/'; then
|
||||
FRAMEWORK="Xamarin / .NET MAUI"
|
||||
RATIONALE="libmonodroid.so or assemblies/ present — code is in .NET DLLs"
|
||||
elif has '^lib/[^/]+/libmaui\.so$'; then
|
||||
FRAMEWORK=".NET MAUI"
|
||||
RATIONALE="libmaui.so present"
|
||||
elif has '^assets/flutter_assets/' && ! has '^lib/[^/]+/libflutter\.so$'; then
|
||||
FRAMEWORK="Flutter (code-only split?)"
|
||||
RATIONALE="flutter_assets/ but no libflutter.so in this APK — check splits"
|
||||
else
|
||||
# Native: distinguish Compose vs classic Android by androidx.compose presence
|
||||
if has 'androidx\.compose'; then
|
||||
FRAMEWORK="Native Android (Kotlin + Jetpack Compose)"
|
||||
RATIONALE="androidx.compose.* libraries detected"
|
||||
elif has '^META-INF/.*\.kotlin_module$'; then
|
||||
FRAMEWORK="Native Android (Kotlin)"
|
||||
RATIONALE="kotlin_module metadata present, no Compose markers"
|
||||
else
|
||||
FRAMEWORK="Native Android (Java/Kotlin)"
|
||||
RATIONALE="no cross-platform framework markers found"
|
||||
fi
|
||||
fi
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# HTTP / DI / serialization stack hints
|
||||
# ----------------------------------------------------------------------
|
||||
http=()
|
||||
has 'retrofit2' && http+=("Retrofit")
|
||||
has 'okhttp3' && http+=("OkHttp")
|
||||
has 'io/ktor/' && http+=("Ktor")
|
||||
has 'com/apollographql/' && http+=("Apollo (GraphQL)")
|
||||
has 'com/android/volley' && http+=("Volley")
|
||||
|
||||
di=()
|
||||
has 'dagger/hilt/' && di+=("Hilt")
|
||||
has '^META-INF/.*dagger.*' && di+=("Dagger")
|
||||
has 'org/koin/' && di+=("Koin")
|
||||
has 'javax/inject/' && [[ ${#di[@]} -eq 0 ]] && di+=("javax.inject")
|
||||
|
||||
ser=()
|
||||
has 'kotlinx/serialization/' && ser+=("kotlinx.serialization")
|
||||
has 'com/google/gson/' && ser+=("Gson")
|
||||
has 'com/squareup/moshi/' && ser+=("Moshi")
|
||||
has 'com/fasterxml/jackson/' && ser+=("Jackson")
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# Obfuscation indicator (R8/ProGuard) — count single-letter dex packages
|
||||
# ----------------------------------------------------------------------
|
||||
# Note: pipefail is on, so guard greps that may legitimately return 0 matches.
|
||||
short_dirs=$( { grep -oE '^[a-z]{1,2}/' "$LISTING" || true; } | sort -u | wc -l | tr -d ' ')
|
||||
if [[ "$short_dirs" -gt 30 ]]; then
|
||||
OBFUSCATION="HIGH ($short_dirs single/double-letter dirs at root)"
|
||||
elif [[ "$short_dirs" -gt 10 ]]; then
|
||||
OBFUSCATION="MODERATE ($short_dirs short root dirs)"
|
||||
else
|
||||
OBFUSCATION="LOW (no significant short-name namespace pollution)"
|
||||
fi
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# Native libraries (consolidated)
|
||||
# ----------------------------------------------------------------------
|
||||
NATIVE=$(grep -E '^lib/[^/]+/[^/]+\.so$' "$LISTING" | sort -u || true)
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# Notable third-party SDKs (assets-based markers)
|
||||
# ----------------------------------------------------------------------
|
||||
sdks=()
|
||||
has '^assets/com/appsflyer/' && sdks+=("AppsFlyer")
|
||||
has 'datadog\.buildId|com/datadog/' && sdks+=("Datadog")
|
||||
has 'io/sentry/' && sdks+=("Sentry")
|
||||
has 'com/google/firebase/' && sdks+=("Firebase")
|
||||
has 'com/google/android/gms/' && sdks+=("Google Play Services")
|
||||
has 'com/facebook/' && sdks+=("Facebook SDK")
|
||||
has 'com/payu/' && sdks+=("PayU")
|
||||
has 'com/stripe/' && sdks+=("Stripe")
|
||||
has 'com/braintreepayments/' && sdks+=("Braintree")
|
||||
has 'com/storyteller/' && sdks+=("Storyteller")
|
||||
has 'zendesk/' && sdks+=("Zendesk")
|
||||
has 'com/intercom/' && sdks+=("Intercom")
|
||||
has 'com/segment/analytics' && sdks+=("Segment")
|
||||
has 'com/amplitude/' && sdks+=("Amplitude")
|
||||
has 'com/mixpanel/' && sdks+=("Mixpanel")
|
||||
has 'com/onesignal/' && sdks+=("OneSignal")
|
||||
has 'com/microsoft/clarity' && sdks+=("Microsoft Clarity")
|
||||
has 'com/hotjar/' && sdks+=("Hotjar")
|
||||
has 'com/instabug/' && sdks+=("Instabug")
|
||||
|
||||
# BuildConfig.java is almost never obfuscated and often holds base URLs / flavor.
|
||||
if has 'BuildConfig\.class$'; then
|
||||
BUILDCONFIG="present (grep BuildConfig.java after decompile for base URLs / flavor)"
|
||||
else
|
||||
BUILDCONFIG="not detected in zip listing (still worth grepping after decompile)"
|
||||
fi
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# Summary
|
||||
# ----------------------------------------------------------------------
|
||||
echo "=== APK Fingerprint: $(basename "$INPUT") ==="
|
||||
echo
|
||||
echo "Framework: $FRAMEWORK"
|
||||
echo " Rationale: $RATIONALE"
|
||||
echo "Obfuscation: $OBFUSCATION"
|
||||
echo
|
||||
echo "HTTP stack: ${http[*]:-none detected}"
|
||||
echo "DI: ${di[*]:-none detected}"
|
||||
echo "Serialization: ${ser[*]:-none detected}"
|
||||
echo "BuildConfig: $BUILDCONFIG"
|
||||
echo
|
||||
echo "Third-party SDKs: ${sdks[*]:-none detected}"
|
||||
echo
|
||||
echo "Native libraries (consolidated across splits):"
|
||||
if [[ -n "$NATIVE" ]]; then
|
||||
echo "$NATIVE" | sed 's/^/ /'
|
||||
else
|
||||
echo " (none)"
|
||||
fi
|
||||
echo
|
||||
|
||||
# ----------------------------------------------------------------------
|
||||
# Recommendation
|
||||
# ----------------------------------------------------------------------
|
||||
echo "Recommended next step:"
|
||||
case "$FRAMEWORK" in
|
||||
Flutter*)
|
||||
echo " Java decompilation will yield ~no app code. The Dart logic lives in"
|
||||
echo " libapp.so (AOT). Use tools designed for Flutter:"
|
||||
echo " - reFlutter / Doldrums / blutter (extract Dart class structure)"
|
||||
echo " - strings/rabin2 on libapp.so for endpoints & string constants"
|
||||
;;
|
||||
React*)
|
||||
echo " Java code is just the RN host. Real app logic is in JS/Hermes:"
|
||||
echo " - if Hermes: hbctool disasm assets/index.android.bundle"
|
||||
echo " - if JSC: js-beautify the bundle and grep for 'fetch('/'axios'"
|
||||
;;
|
||||
Cordova*)
|
||||
echo " All app code is in assets/www/ (or assets/public/). Just unzip and"
|
||||
echo " inspect the HTML/JS — no Java decompile needed."
|
||||
;;
|
||||
Xamarin*|.NET*)
|
||||
echo " App logic is in .NET DLLs (assemblies/). Use ILSpy or dotPeek;"
|
||||
echo " jadx will only show the Mono host."
|
||||
;;
|
||||
*)
|
||||
echo " Proceed with Phase 2: bash scripts/decompile.sh <file>"
|
||||
;;
|
||||
esac
|
||||
|
|
@ -0,0 +1,85 @@
|
|||
#!/usr/bin/env bash
|
||||
# lookup-name.sh — Query the mapping produced by recover-kotlin-names.sh.
|
||||
#
|
||||
# Modes:
|
||||
# lookup-name.sh <mapping-dir> <substring> search by real-FQN substring
|
||||
# lookup-name.sh <mapping-dir> -o <obf> resolve obf -> real
|
||||
# lookup-name.sh <mapping-dir> -p <pkg> list a real package
|
||||
# lookup-name.sh <mapping-dir> --grep <regex> <sources-dir>
|
||||
# grep decompiled sources and annotate each hit with the real class name
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
usage() {
|
||||
cat <<EOF
|
||||
Usage: lookup-name.sh <mapping-dir> <query>
|
||||
lookup-name.sh <mapping-dir> -o <obf-fqn>
|
||||
lookup-name.sh <mapping-dir> -p <real-package-substring>
|
||||
lookup-name.sh <mapping-dir> --grep <regex> <sources-dir>
|
||||
|
||||
<mapping-dir> is the directory produced by recover-kotlin-names.sh
|
||||
(must contain mapping.json).
|
||||
EOF
|
||||
exit 0
|
||||
}
|
||||
|
||||
[[ $# -lt 2 ]] && usage
|
||||
DIR="$1"; shift
|
||||
[[ ! -f "$DIR/mapping.json" ]] && { echo "no mapping.json in $DIR" >&2; exit 1; }
|
||||
|
||||
python3 - "$DIR" "$@" <<'PY'
|
||||
import json, os, re, sys, subprocess
|
||||
DIR = sys.argv[1]
|
||||
args = sys.argv[2:]
|
||||
MAP = json.load(open(os.path.join(DIR, "mapping.json")))
|
||||
REV = {}
|
||||
for o, r in MAP.items():
|
||||
REV.setdefault(r, []).append(o)
|
||||
|
||||
def search(q):
|
||||
ql = q.lower()
|
||||
for r in sorted(REV):
|
||||
if ql in r.lower():
|
||||
print(r)
|
||||
for o in sorted(REV[r]):
|
||||
print(f" {o}")
|
||||
|
||||
def by_obf(o):
|
||||
if o not in MAP:
|
||||
print(f"no mapping for {o}", file=sys.stderr); sys.exit(1)
|
||||
print(f"{o} -> {MAP[o]}")
|
||||
sibs = [s for s in REV[MAP[o]] if s != o]
|
||||
for s in sorted(sibs):
|
||||
print(f" sibling: {s}")
|
||||
|
||||
def by_pkg(p):
|
||||
pl = p.lower()
|
||||
for r in sorted(REV):
|
||||
if pl in r.rsplit(".", 1)[0].lower():
|
||||
print(r)
|
||||
for o in sorted(REV[r]):
|
||||
print(f" {o}")
|
||||
|
||||
def grep_annot(pattern, sources):
|
||||
res = subprocess.run(
|
||||
["grep", "-rEn", "--include=*.java", pattern, sources],
|
||||
capture_output=True, text=True)
|
||||
for line in res.stdout.splitlines():
|
||||
try:
|
||||
path, lineno, content = line.split(":", 2)
|
||||
except ValueError:
|
||||
continue
|
||||
rel = os.path.relpath(path, sources)
|
||||
obf = rel.replace(os.sep, ".")[:-5]
|
||||
suffix = f" // {MAP[obf]}" if obf in MAP else ""
|
||||
print(f"{rel}:{lineno}:{content}{suffix}")
|
||||
|
||||
if args[0] == "-o" and len(args) == 2:
|
||||
by_obf(args[1])
|
||||
elif args[0] == "-p" and len(args) == 2:
|
||||
by_pkg(args[1])
|
||||
elif args[0] == "--grep" and len(args) == 3:
|
||||
grep_annot(args[1], args[2])
|
||||
else:
|
||||
search(" ".join(args))
|
||||
PY
|
||||
|
|
@ -0,0 +1,140 @@
|
|||
#!/usr/bin/env bash
|
||||
# recover-kotlin-names.sh — Rebuild a (obfuscated -> real) class-name map
|
||||
# from Kotlin metadata strings left in decompiled sources.
|
||||
#
|
||||
# R8 obfuscates JVM symbols but cannot strip the Kotlin metadata strings —
|
||||
# the Kotlin runtime (reflection, coroutines) needs them at runtime. Two
|
||||
# annotations carry the original FQN:
|
||||
#
|
||||
# * @DebugMetadata(c = "<full.qualified.Name>", f = "<File.kt>", ...)
|
||||
# emitted for almost every `suspend` function (every coroutine
|
||||
# SuspendLambda).
|
||||
#
|
||||
# * @Metadata(... d2 = {"...L<pkg/Class>;..."} ...) listing internal
|
||||
# class refs of the file.
|
||||
#
|
||||
# Typical recovery on a real-world app: 30-50 % of classes regain their real
|
||||
# names — usually 100 % of the *Repository / *ViewModel / *UseCase / *Impl
|
||||
# classes you actually want to read.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
usage() {
|
||||
cat <<EOF
|
||||
Usage: recover-kotlin-names.sh <decompiled-sources-dir> [output-dir]
|
||||
|
||||
Walks every *.java under <decompiled-sources-dir>, mines @DebugMetadata
|
||||
and @Metadata annotations, and writes:
|
||||
|
||||
<output-dir>/mapping.tsv tab-separated obf_fqn <TAB> real_fqn <TAB> file
|
||||
<output-dir>/mapping.json same data as JSON { obf_fqn: real_fqn, ... }
|
||||
<output-dir>/by_package/ one file per real package, listing
|
||||
real_fqn <TAB> obf_fqn <TAB> file
|
||||
|
||||
If [output-dir] is omitted, files are written next to the sources dir.
|
||||
EOF
|
||||
exit 0
|
||||
}
|
||||
|
||||
[[ $# -lt 1 || "$1" == "-h" || "$1" == "--help" ]] && usage
|
||||
SRC="$1"
|
||||
OUT="${2:-$(dirname "$SRC")/mapping}"
|
||||
[[ ! -d "$SRC" ]] && { echo "not a directory: $SRC" >&2; exit 1; }
|
||||
|
||||
mkdir -p "$OUT/by_package"
|
||||
|
||||
python3 - "$SRC" "$OUT" <<'PY'
|
||||
import os, re, sys, json
|
||||
from collections import defaultdict
|
||||
|
||||
SRC, OUT = sys.argv[1], sys.argv[2]
|
||||
|
||||
# @DebugMetadata(c = "com.foo.Bar$Inner$1", ...)
|
||||
RE_DEBUG = re.compile(r'@DebugMetadata\([^)]*?c\s*=\s*"([^"]+)"', re.S)
|
||||
# @Metadata(... d2 = { "...Lcom/foo/Bar;..." ...} )
|
||||
RE_DTWO = re.compile(r'@Metadata\([^)]*?d2\s*=\s*\{([^}]*)\}', re.S)
|
||||
RE_LCLASS = re.compile(r'L([A-Za-z][\w/$]+);')
|
||||
# jadx sometimes emits this comment for renamed classes
|
||||
RE_RENAMED = re.compile(r'/\*\s*renamed from:\s*([\w.$]+)\s*\*/')
|
||||
|
||||
# Skip third-party / framework trees — their names are already real.
|
||||
SKIP_PREFIXES = (
|
||||
"kotlin.", "kotlinx.", "androidx.", "android.", "java.", "javax.",
|
||||
"com.google.", "com.facebook.", "com.appsflyer.", "com.datadog.",
|
||||
"io.ktor.", "io.sentry.", "io.realm.", "okhttp3.", "okio.",
|
||||
"com.squareup.", "com.bumptech.", "com.airbnb.", "com.payu.",
|
||||
"com.storyteller.", "zendesk.", "io.intercom.", "com.microsoft.",
|
||||
"com.tinder.", "com.hotjar.", "com.amplitude.", "com.segment.",
|
||||
"com.mixpanel.", "com.onesignal.", "com.stripe.", "com.braintreepayments.",
|
||||
"retrofit2.", "dagger.", "javax.inject.", "org.jetbrains.",
|
||||
)
|
||||
|
||||
mapping = {}
|
||||
file_real = {}
|
||||
counts = defaultdict(int)
|
||||
|
||||
for dp, _, files in os.walk(SRC):
|
||||
for f in files:
|
||||
if not f.endswith(".java"):
|
||||
continue
|
||||
path = os.path.join(dp, f)
|
||||
rel = os.path.relpath(path, SRC)
|
||||
obf = rel[:-5].replace(os.sep, ".")
|
||||
if obf.startswith(SKIP_PREFIXES):
|
||||
continue
|
||||
try:
|
||||
text = open(path, "r", errors="replace").read()
|
||||
except OSError:
|
||||
continue
|
||||
real = None
|
||||
|
||||
m = RE_DEBUG.search(text)
|
||||
if m:
|
||||
real = m.group(1).split("$", 1)[0]
|
||||
counts["debug_meta"] += 1
|
||||
|
||||
if not real:
|
||||
m = RE_DTWO.search(text)
|
||||
if m:
|
||||
for lm in RE_LCLASS.finditer(m.group(1)):
|
||||
cand = lm.group(1).replace("/", ".").split("$", 1)[0]
|
||||
if "." in cand and not cand.startswith(("kotlin.", "java.", "android")):
|
||||
real = cand
|
||||
counts["d2"] += 1
|
||||
break
|
||||
|
||||
if not real:
|
||||
m = RE_RENAMED.search(text)
|
||||
if m:
|
||||
real = m.group(1)
|
||||
counts["renamed"] += 1
|
||||
|
||||
if real:
|
||||
mapping[obf] = real
|
||||
file_real[obf] = path
|
||||
|
||||
with open(os.path.join(OUT, "mapping.tsv"), "w") as f:
|
||||
f.write("obf_fqn\treal_fqn\tfile\n")
|
||||
for k in sorted(mapping):
|
||||
f.write(f"{k}\t{mapping[k]}\t{file_real[k]}\n")
|
||||
|
||||
with open(os.path.join(OUT, "mapping.json"), "w") as f:
|
||||
json.dump(mapping, f, indent=2, sort_keys=True)
|
||||
|
||||
by_pkg = defaultdict(list)
|
||||
for obf, real in mapping.items():
|
||||
pkg = real.rsplit(".", 1)[0] if "." in real else "(default)"
|
||||
by_pkg[pkg].append((real, obf, file_real[obf]))
|
||||
|
||||
for pkg, rows in by_pkg.items():
|
||||
safe = pkg.replace(".", "_") or "default"
|
||||
with open(os.path.join(OUT, "by_package", f"{safe}.txt"), "w") as f:
|
||||
for real, obf, p in sorted(rows):
|
||||
f.write(f"{real}\t{obf}\t{p}\n")
|
||||
|
||||
print(f"Recovered {len(mapping)} class names")
|
||||
for k, v in counts.items():
|
||||
print(f" via {k}: {v}")
|
||||
print(f"Real packages: {len(by_pkg)}")
|
||||
print(f"Wrote {OUT}/mapping.tsv, mapping.json, by_package/")
|
||||
PY
|
||||
Loading…
Reference in New Issue