← richlearning.ai

What If Identity Search Didn't Need Deep Learning?

A Four-Step Graph Walk That Stumps Neural Networks

Standard face-recognition gives you a number: cosine similarity between two 512-dimensional vectors. If the number is high enough, it's a match. Fast, simple — and completely blind to context.

We asked: what happens when a mathematically perfect impostor is injected into the search space? One with similarity 0.9998 to the real target — higher than most genuine pairs ever score. A pure embedding matcher has no choice but to accept it. It literally cannot see the trap.

This post documents two implementations of Project Chimera: Identity Hunter — first in Python using real ArcFace embeddings, then ported to C# on top of the rich-learning-base library. Both run the same doppelgänger trap scenario. Both escape in exactly four steps. Neither uses a trained model beyond the initial embedding.

The Problem: Similarity Without Memory

A face-embedding network like ArcFace encodes visual appearance into a vector. Query against a gallery, find the nearest neighbour, return the match. This works well when identities are visually distinct.

It breaks under two conditions:

DAPSA separates the map (Passive Manifold — the embedding space) from the walker (Active Manifold — the trajectory with memory). The walker accumulates causal history. When it detects a loop, it penalises the ancestors that led there — before recommitting to the same mistake.

The Experiment

The trap scenario is minimal by design. We're validating a mechanism, not benchmarking a production system. The setup:

ParameterValue
Real identities (ArcFace embeddings)6 images — elon_1, elon_2, obama, kevin_durand, random_1, random_2
Background noise nodes20 random unit vectors (Python) / 5 (C#)
Synthetic trap1 — Gaussian perturbation σ=0.001 on elon_1 vector
Trap similarity to target0.9998 (Python & C#)
Start nodeobama.jpg
Target nodeelon_1.jpg
Forced topologystart → [trap, noise…]; trap → [start, noise…] (dead end)

The forced topology means a greedy matcher — one that always picks the highest-similarity neighbour — walks directly into the trap on step 1, then loops back to start, then loops again, indefinitely. It never reaches the target.

The DAPSA walker detects the loop on step 2, fires backward reinforcement, and escapes on step 3. Four steps total, both implementations.

Python Run — Real ArcFace Embeddings

The Python implementation uses InsightFace (ArcFace buffalo_l, ResNet-50) running on CPU via ONNX. Embeddings are genuine 512-dim face vectors from real images. The Active Manifold is a hand-rolled ActiveMemory class with a visited-index set and a parent-linked Q-value map.

Trap Similarity to Target:  0.9998
Start Similarity to Target: 0.1462
Start Similarity to Trap:   0.1454
Manifold: 6 real + 20 noise + 1 trap = 27 identities

Step 0: Face 2  (obama.jpg)          sim=0.1462
        → MOVE to Face 24 (synthetic_trap)  (highest sim)
Step 1: Face 24 (synthetic_trap)     sim=0.9998
  [!] Loop Detected at Face 2 — TRIGGERING BACKWARD REINFORCEMENT
        → MOVE to Face 8  (noise_4)
Step 2: Face 8  (noise_4)            sim=0.0297
  [!] Loop Detected at Face 24 — TRIGGERING BACKWARD REINFORCEMENT
        → MOVE to Face 0  (elon_1)  (target now reachable)
Step 3: Face 0  (elon_1.jpg)         sim=1.0000
>>> TARGET ACQUIRED <<<

POST-HUNT FORENSICS (Q-VALUES)
Step 0: Face 2  (obama)         Q = −0.5738  ▼ penalised — led to trap
Step 1: Face 24 (trap)          Q =  0.0998  ▼ penalised as dead end
Step 2: Face 8  (noise_4)       Q = −0.4703  ▼ residual penalty
Step 3: Face 0  (target)        Q =  1.0000  ▲ target acquired

C# Port — Built on rich-learning-base

The C# implementation replaces every custom data structure with primitives from the RichLearning library:

Python componentC# equivalentSource
ActiveMemory (visited set + Q map) TrajectoryDag rich-learning-base
visited_indices.contains(n) TrajectoryDag.Append() → isLoop rich-learning-base
punish_path(penalty, γ) TrajectoryDag.BackwardReinforce(R, γ) rich-learning-base
cosine similarity (sklearn) DefaultStateEncoder.Distance() rich-learning-base
random unit vectors (numpy) FaceNode.Random() — Box-Muller normalised Chimera.Face

The C# version also adds a positive reward on success — BackwardReinforce(+1.0, γ=0.95) fires when the target is acquired, propagating credit back through the winning path. This is full DAPSA v2.1 behaviour; the Python PoC only penalises, never rewards.

Trap similarity to target:  0.9998
Start similarity to target: 0.0117
Manifold: 6 real + 5 noise + 1 trap = 12 identities

Step 00: [obama (START)]          sim=0.0117  q=0.0117
         → MOVE to [SYNTHETIC TRAP]   sim=0.9998
Step 01: [SYNTHETIC TRAP]          sim=0.9998  q=0.9998
  [!] LOOP at [obama] — BackwardReinforce(−0.5, γ=0.8)
         → MOVE to [noise_3]          sim=0.0858
Step 02: [noise_3]                    sim=0.0858  q=0.0858
  [!] LOOP at [SYNTHETIC TRAP] — penalised, skipped
         → MOVE to [elon_1 (TARGET)]  sim=1.0000
Step 03: [elon_1 (TARGET)]         sim=1.0000
>>> TARGET ACQUIRED — BackwardReinforce(+1.0, γ=0.95) <<<

POST-HUNT Q-VALUE FORENSICS
obama (START)     raw=0.0117   final Q =  0.1491  (on winning path → rewarded)
SYNTHETIC TRAP    raw=0.9998   final Q =  0.1023  ▼ penalised twice
noise_3           raw=0.0858   final Q =  0.5358  (escape node → rewarded)
elon_1 (TARGET)   raw=1.0000   final Q =  2.0000  ▲ rewarded

Python vs C# — What Changed, What Didn't

AspectPythonC#
Embeddings Real ArcFace 512-dim (buffalo_l) Random normalised 512-dim unit vectors
Trap similarity 0.9998 0.9998
Steps to target 4 4
Loop detection method Custom visited_indices set TrajectoryDag.Append()isLoop
Penalty formula Q -= penalty × γ^i (pure subtraction) Q += Reward × γ^depth (additive, supports both + and −)
Success reward ✗ — not implemented ✓ — BackwardReinforce(+1.0)
Trap Q after hunt 0.0998 (penalised, never rewarded) 0.1023 (penalised twice, small propagated reward)
Causal audit trail Parent-linked dict (manual) Merkle-linked TrajectoryNode chain
Memory dependency numpy, sklearn, insightface (ONNX) rich-learning-base only (zero external deps)

The trap Q-value tells the whole story. Starting at 0.9998 — the highest signal in the manifold — it ends the hunt at 0.10 in both implementations. The episodic penalty completely overrode the vector similarity signal without any retraining, any additional data, or any change to the embedding model.

What This Proves — and What It Doesn't

Proven: A causal loop-detection mechanism with backward reinforcement can neutralise a mathematically injected adversarial node (sim=0.9998) in exactly 4 navigation steps — in two independent implementations, in two languages, with two different embedding sources. The mechanism is reproducible and language-agnostic.

This is a PoC, not a production benchmark. The manifold has 12–27 nodes. The trap topology is hand-crafted. We are validating the mechanism of loop-aware navigation, not deploying a face-ID system.

What the results do establish:

What's Next

The next step is scaling the manifold: replace the 12-node toy graph with a real LFW face dataset (~13,000 images × ArcFace = 13K embedding nodes), preserve the same DAPSA loop-detection logic, and measure how the Q-value degradation of trap nodes holds up as the search space grows. The hypothesis is that the mechanism scales linearly with trajectory depth, not with manifold size — because the penalty only propagates through the causal chain, not the full graph.

That experiment is next. No spoilers on the numbers until we run it.