Standard face-recognition gives you a number: cosine similarity between two 512-dimensional vectors. If the number is high enough, it's a match. Fast, simple — and completely blind to context.
We asked: what happens when a mathematically perfect impostor is injected into the search space? One with similarity 0.9998 to the real target — higher than most genuine pairs ever score. A pure embedding matcher has no choice but to accept it. It literally cannot see the trap.
This post documents two implementations of Project Chimera: Identity Hunter — first in Python using real ArcFace embeddings, then ported to C# on top of the rich-learning-base library. Both run the same doppelgänger trap scenario. Both escape in exactly four steps. Neither uses a trained model beyond the initial embedding.
The Problem: Similarity Without Memory
A face-embedding network like ArcFace encodes visual appearance into a vector. Query against a gallery, find the nearest neighbour, return the match. This works well when identities are visually distinct.
It breaks under two conditions:
- Adversarial similarity — two identities whose embeddings land arbitrarily close together. A Gaussian perturbation of σ=0.001 on a 512-dim unit vector produces cosine similarity ≥ 0.9997. That's above the threshold for any production face-ID system.
- Context blindness — the matcher has no memory of the search path that led here. If a candidate was already visited and found to be a dead end, the embedding system will visit it again on the next query. It cannot learn from its own navigation history.
DAPSA separates the map (Passive Manifold — the embedding space) from the walker (Active Manifold — the trajectory with memory). The walker accumulates causal history. When it detects a loop, it penalises the ancestors that led there — before recommitting to the same mistake.
The Experiment
The trap scenario is minimal by design. We're validating a mechanism, not benchmarking a production system. The setup:
| Parameter | Value |
|---|---|
| Real identities (ArcFace embeddings) | 6 images — elon_1, elon_2, obama, kevin_durand, random_1, random_2 |
| Background noise nodes | 20 random unit vectors (Python) / 5 (C#) |
| Synthetic trap | 1 — Gaussian perturbation σ=0.001 on elon_1 vector |
| Trap similarity to target | 0.9998 (Python & C#) |
| Start node | obama.jpg |
| Target node | elon_1.jpg |
| Forced topology | start → [trap, noise…]; trap → [start, noise…] (dead end) |
The forced topology means a greedy matcher — one that always picks the highest-similarity neighbour — walks directly into the trap on step 1, then loops back to start, then loops again, indefinitely. It never reaches the target.
The DAPSA walker detects the loop on step 2, fires backward reinforcement, and escapes on step 3. Four steps total, both implementations.
Python Run — Real ArcFace Embeddings
The Python implementation uses InsightFace (ArcFace buffalo_l, ResNet-50)
running on CPU via ONNX. Embeddings are genuine 512-dim face vectors from real images.
The Active Manifold is a hand-rolled ActiveMemory class with a visited-index
set and a parent-linked Q-value map.
Trap Similarity to Target: 0.9998 Start Similarity to Target: 0.1462 Start Similarity to Trap: 0.1454 Manifold: 6 real + 20 noise + 1 trap = 27 identities Step 0: Face 2 (obama.jpg) sim=0.1462 → MOVE to Face 24 (synthetic_trap) (highest sim) Step 1: Face 24 (synthetic_trap) sim=0.9998 [!] Loop Detected at Face 2 — TRIGGERING BACKWARD REINFORCEMENT → MOVE to Face 8 (noise_4) Step 2: Face 8 (noise_4) sim=0.0297 [!] Loop Detected at Face 24 — TRIGGERING BACKWARD REINFORCEMENT → MOVE to Face 0 (elon_1) (target now reachable) Step 3: Face 0 (elon_1.jpg) sim=1.0000 >>> TARGET ACQUIRED <<< POST-HUNT FORENSICS (Q-VALUES) Step 0: Face 2 (obama) Q = −0.5738 ▼ penalised — led to trap Step 1: Face 24 (trap) Q = 0.0998 ▼ penalised as dead end Step 2: Face 8 (noise_4) Q = −0.4703 ▼ residual penalty Step 3: Face 0 (target) Q = 1.0000 ▲ target acquired
C# Port — Built on rich-learning-base
The C# implementation replaces every custom data structure with primitives from
the RichLearning library:
| Python component | C# equivalent | Source |
|---|---|---|
ActiveMemory (visited set + Q map) | TrajectoryDag | rich-learning-base |
visited_indices.contains(n) | TrajectoryDag.Append() → isLoop | rich-learning-base |
punish_path(penalty, γ) | TrajectoryDag.BackwardReinforce(R, γ) | rich-learning-base |
| cosine similarity (sklearn) | DefaultStateEncoder.Distance() | rich-learning-base |
| random unit vectors (numpy) | FaceNode.Random() — Box-Muller normalised | Chimera.Face |
The C# version also adds a positive reward on success —
BackwardReinforce(+1.0, γ=0.95) fires when the target is acquired,
propagating credit back through the winning path. This is full DAPSA v2.1
behaviour; the Python PoC only penalises, never rewards.
Trap similarity to target: 0.9998 Start similarity to target: 0.0117 Manifold: 6 real + 5 noise + 1 trap = 12 identities Step 00: [obama (START)] sim=0.0117 q=0.0117 → MOVE to [SYNTHETIC TRAP] sim=0.9998 Step 01: [SYNTHETIC TRAP] sim=0.9998 q=0.9998 [!] LOOP at [obama] — BackwardReinforce(−0.5, γ=0.8) → MOVE to [noise_3] sim=0.0858 Step 02: [noise_3] sim=0.0858 q=0.0858 [!] LOOP at [SYNTHETIC TRAP] — penalised, skipped → MOVE to [elon_1 (TARGET)] sim=1.0000 Step 03: [elon_1 (TARGET)] sim=1.0000 >>> TARGET ACQUIRED — BackwardReinforce(+1.0, γ=0.95) <<< POST-HUNT Q-VALUE FORENSICS obama (START) raw=0.0117 final Q = 0.1491 (on winning path → rewarded) SYNTHETIC TRAP raw=0.9998 final Q = 0.1023 ▼ penalised twice noise_3 raw=0.0858 final Q = 0.5358 (escape node → rewarded) elon_1 (TARGET) raw=1.0000 final Q = 2.0000 ▲ rewarded
Python vs C# — What Changed, What Didn't
| Aspect | Python | C# |
|---|---|---|
| Embeddings | Real ArcFace 512-dim (buffalo_l) | Random normalised 512-dim unit vectors |
| Trap similarity | 0.9998 | 0.9998 |
| Steps to target | 4 | 4 |
| Loop detection method | Custom visited_indices set | TrajectoryDag.Append() → isLoop |
| Penalty formula | Q -= penalty × γ^i (pure subtraction) | Q += Reward × γ^depth (additive, supports both + and −) |
| Success reward | ✗ — not implemented | ✓ — BackwardReinforce(+1.0) |
| Trap Q after hunt | 0.0998 (penalised, never rewarded) | 0.1023 (penalised twice, small propagated reward) |
| Causal audit trail | Parent-linked dict (manual) | Merkle-linked TrajectoryNode chain |
| Memory dependency | numpy, sklearn, insightface (ONNX) | rich-learning-base only (zero external deps) |
The trap Q-value tells the whole story. Starting at 0.9998 — the highest signal in the manifold — it ends the hunt at 0.10 in both implementations. The episodic penalty completely overrode the vector similarity signal without any retraining, any additional data, or any change to the embedding model.
What This Proves — and What It Doesn't
Proven: A causal loop-detection mechanism with backward reinforcement can neutralise a mathematically injected adversarial node (sim=0.9998) in exactly 4 navigation steps — in two independent implementations, in two languages, with two different embedding sources. The mechanism is reproducible and language-agnostic.
This is a PoC, not a production benchmark. The manifold has 12–27 nodes. The trap topology is hand-crafted. We are validating the mechanism of loop-aware navigation, not deploying a face-ID system.
What the results do establish:
- A pure cosine-similarity matcher on the same manifold would loop indefinitely — it has no mechanism to label the trap as "dangerous" after visiting it once.
-
The
TrajectoryDaggives every node a living Q-value that reflects its causal history, not just its embedding distance. That is a category of information entirely absent from the embedding vector. -
Adding the C# port on top of
rich-learning-baserequired zero new data structures —TrajectoryDag,BackwardReinforce, andDefaultStateEncoderwere all already there. The PoC is a thin domain adapter over a general-purpose library.
What's Next
The next step is scaling the manifold: replace the 12-node toy graph with a real LFW face dataset (~13,000 images × ArcFace = 13K embedding nodes), preserve the same DAPSA loop-detection logic, and measure how the Q-value degradation of trap nodes holds up as the search space grows. The hypothesis is that the mechanism scales linearly with trajectory depth, not with manifold size — because the penalty only propagates through the causal chain, not the full graph.
That experiment is next. No spoilers on the numbers until we run it.