What If Identity Search Didn't Need Deep Learning?

Standard face-recognition gives you a number: cosine similarity between two 512-dimensional vectors. If the number is high enough, it's a match. Fast, simple — and completely blind to context.

We asked: what happens when a mathematically perfect impostor is injected into the search space? One with similarity 0.9998 to the real target — higher than most genuine pairs ever score. A pure embedding matcher has no choice but to accept it. It literally cannot see the trap.

This post documents two implementations of Project Chimera: Identity Hunter — first in Python using real ArcFace embeddings, then ported to C# on top of the rich-learning-base library. Both run the same doppelgänger trap scenario. Both escape in exactly four steps. Neither uses a trained model beyond the initial embedding.

The Problem: Similarity Without Memory

A face-embedding network like ArcFace encodes visual appearance into a vector. Query against a gallery, find the nearest neighbour, return the match. This works well when identities are visually distinct.

It breaks under two conditions:

Adversarial similarity — two identities whose embeddings land arbitrarily close together. A Gaussian perturbation of σ=0.001 on a 512-dim unit vector produces cosine similarity ≥ 0.9997. That's above the threshold for any production face-ID system.
Context blindness — the matcher has no memory of the search path that led here. If a candidate was already visited and found to be a dead end, the embedding system will visit it again on the next query. It cannot learn from its own navigation history.

DAPSA separates the map (Passive Manifold — the embedding space) from the walker (Active Manifold — the trajectory with memory). The walker accumulates causal history. When it detects a loop, it penalises the ancestors that led there — before recommitting to the same mistake.

The Experiment

The trap scenario is minimal by design. We're validating a mechanism, not benchmarking a production system. The setup:

Parameter	Value
Real identities (ArcFace embeddings)	6 images — elon_1, elon_2, obama, kevin_durand, random_1, random_2
Background noise nodes	20 random unit vectors (Python) / 5 (C#)
Synthetic trap	1 — Gaussian perturbation σ=0.001 on elon_1 vector
Trap similarity to target	0.9998 (Python & C#)
Start node	obama.jpg
Target node	elon_1.jpg
Forced topology	start → [trap, noise…]; trap → [start, noise…] (dead end)

The forced topology means a greedy matcher — one that always picks the highest-similarity neighbour — walks directly into the trap on step 1, then loops back to start, then loops again, indefinitely. It never reaches the target.

The DAPSA walker detects the loop on step 2, fires backward reinforcement, and escapes on step 3. Four steps total, both implementations.

Python Run — Real ArcFace Embeddings

The Python implementation uses InsightFace (ArcFace buffalo_l, ResNet-50) running on CPU via ONNX. Embeddings are genuine 512-dim face vectors from real images. The Active Manifold is a hand-rolled ActiveMemory class with a visited-index set and a parent-linked Q-value map.

Trap Similarity to Target:  0.9998
Start Similarity to Target: 0.1462
Start Similarity to Trap:   0.1454
Manifold: 6 real + 20 noise + 1 trap = 27 identities

Step 0: Face 2  (obama.jpg)          sim=0.1462
        → MOVE to Face 24 (synthetic_trap)  (highest sim)
Step 1: Face 24 (synthetic_trap)     sim=0.9998
  [!] Loop Detected at Face 2 — TRIGGERING BACKWARD REINFORCEMENT
        → MOVE to Face 8  (noise_4)
Step 2: Face 8  (noise_4)            sim=0.0297
  [!] Loop Detected at Face 24 — TRIGGERING BACKWARD REINFORCEMENT
        → MOVE to Face 0  (elon_1)  (target now reachable)
Step 3: Face 0  (elon_1.jpg)         sim=1.0000
>>> TARGET ACQUIRED <<<

POST-HUNT FORENSICS (Q-VALUES)
Step 0: Face 2  (obama)         Q = −0.5738  ▼ penalised — led to trap
Step 1: Face 24 (trap)          Q =  0.0998  ▼ penalised as dead end
Step 2: Face 8  (noise_4)       Q = −0.4703  ▼ residual penalty
Step 3: Face 0  (target)        Q =  1.0000  ▲ target acquired

C# Port — Built on rich-learning-base

The C# implementation replaces every custom data structure with primitives from the RichLearning library:

Python component	C# equivalent	Source
`ActiveMemory` (visited set + Q map)	`TrajectoryDag`	rich-learning-base
`visited_indices.contains(n)`	`TrajectoryDag.Append() → isLoop`	rich-learning-base
`punish_path(penalty, γ)`	`TrajectoryDag.BackwardReinforce(R, γ)`	rich-learning-base
cosine similarity (sklearn)	`DefaultStateEncoder.Distance()`	rich-learning-base
random unit vectors (numpy)	`FaceNode.Random()` — Box-Muller normalised	Chimera.Face

The C# version also adds a positive reward on success — BackwardReinforce(+1.0, γ=0.95) fires when the target is acquired, propagating credit back through the winning path. This is full DAPSA v2.1 behaviour; the Python PoC only penalises, never rewards.

Trap similarity to target:  0.9998
Start similarity to target: 0.0117
Manifold: 6 real + 5 noise + 1 trap = 12 identities

Step 00: [obama (START)]          sim=0.0117  q=0.0117
         → MOVE to [SYNTHETIC TRAP]   sim=0.9998
Step 01: [SYNTHETIC TRAP]          sim=0.9998  q=0.9998
  [!] LOOP at [obama] — BackwardReinforce(−0.5, γ=0.8)
         → MOVE to [noise_3]          sim=0.0858
Step 02: [noise_3]                    sim=0.0858  q=0.0858
  [!] LOOP at [SYNTHETIC TRAP] — penalised, skipped
         → MOVE to [elon_1 (TARGET)]  sim=1.0000
Step 03: [elon_1 (TARGET)]         sim=1.0000
>>> TARGET ACQUIRED — BackwardReinforce(+1.0, γ=0.95) <<<

POST-HUNT Q-VALUE FORENSICS
obama (START)     raw=0.0117   final Q =  0.1491  (on winning path → rewarded)
SYNTHETIC TRAP    raw=0.9998   final Q =  0.1023  ▼ penalised twice
noise_3           raw=0.0858   final Q =  0.5358  (escape node → rewarded)
elon_1 (TARGET)   raw=1.0000   final Q =  2.0000  ▲ rewarded

Python vs C# — What Changed, What Didn't

Aspect	Python	C#
Embeddings	Real ArcFace 512-dim (buffalo_l)	Random normalised 512-dim unit vectors
Trap similarity	0.9998	0.9998
Steps to target	4	4
Loop detection method	Custom `visited_indices` set	`TrajectoryDag.Append()` → `isLoop`
Penalty formula	`Q -= penalty × γ^i` (pure subtraction)	`Q += Reward × γ^depth` (additive, supports both + and −)
Success reward	✗ — not implemented	✓ — `BackwardReinforce(+1.0)`
Trap Q after hunt	0.0998 (penalised, never rewarded)	0.1023 (penalised twice, small propagated reward)
Causal audit trail	Parent-linked dict (manual)	Merkle-linked `TrajectoryNode` chain
Memory dependency	numpy, sklearn, insightface (ONNX)	rich-learning-base only (zero external deps)

The trap Q-value tells the whole story. Starting at 0.9998 — the highest signal in the manifold — it ends the hunt at 0.10 in both implementations. The episodic penalty completely overrode the vector similarity signal without any retraining, any additional data, or any change to the embedding model.

What This Proves — and What It Doesn't

Proven: A causal loop-detection mechanism with backward reinforcement can neutralise a mathematically injected adversarial node (sim=0.9998) in exactly 4 navigation steps — in two independent implementations, in two languages, with two different embedding sources. The mechanism is reproducible and language-agnostic.

This is a PoC, not a production benchmark. The manifold has 12–27 nodes. The trap topology is hand-crafted. We are validating the mechanism of loop-aware navigation, not deploying a face-ID system.

What the results do establish:

A pure cosine-similarity matcher on the same manifold would loop indefinitely — it has no mechanism to label the trap as "dangerous" after visiting it once.
The TrajectoryDag gives every node a living Q-value that reflects its causal history, not just its embedding distance. That is a category of information entirely absent from the embedding vector.
Adding the C# port on top of rich-learning-base required zero new data structures — TrajectoryDag, BackwardReinforce, and DefaultStateEncoder were all already there. The PoC is a thin domain adapter over a general-purpose library.

What's Next

The next step is scaling the manifold: replace the 12-node toy graph with a real LFW face dataset (~13,000 images × ArcFace = 13K embedding nodes), preserve the same DAPSA loop-detection logic, and measure how the Q-value degradation of trap nodes holds up as the search space grows. The hypothesis is that the mechanism scales linearly with trajectory depth, not with manifold size — because the penalty only propagates through the causal chain, not the full graph.

That experiment is next. No spoilers on the numbers until we run it.