Architecture

Rich Learning is built on a simple premise: knowledge is topology, not weights. Instead of encoding learned behaviour in neural network parameters, we store it as a directed property graph where nodes represent states and edges represent transitions.

Core Insight

Catastrophic forgetting happens because overwriting weights erases previous knowledge. A graph never forgets — adding a new node doesn't delete old ones.

Component Hierarchy

The system follows a three-layer hierarchy:

┌─────────────────────────────────┐
│       HierarchicalAgent         │  ← Manager: task selection
├─────────────────────────────────┤
│         Cartographer            │  ← Mid-level: planning & mapping
├─────────────────────────────────┤
│     TopologicalGraphMemory      │  ← Memory: graph CRUD & queries
│     (LiteDB or Neo4j backend)   │
└─────────────────────────────────┘

TopologicalGraphMemory (IGraphMemory)

The persistence layer. Stores StateLandmark nodes and StateTransition edges. Implements graph operations:

CRUD — Upsert and retrieve landmarks and transitions
Nearest Neighbour — Find the closest landmark to a new observation (cosine distance)
Shortest Path — BFS/Cypher-based path finding between landmarks
Cycle Detection — Identify loops in the agent's recent trajectory
Frontier Discovery — Find under-explored landmarks at the boundary of known space
Prioritised Replay — Sample transitions weighted by TD-error and staleness
Cluster Assignment — Label propagation for discovering state-space regions

Two backends implement this interface: LiteDbGraphMemory (embedded, zero-config) and Neo4jGraphMemory (server-based, supports Cypher and graph visualisation).

Cartographer

The mid-level planner. Sits between raw observations and the memory backend. Responsibilities:

State Observation — Encodes raw states via IStateEncoder, decides whether to create a new landmark or update an existing one using a novelty threshold
Transition Recording — Logs transitions with reward, success rate, and temporal distance
Loop Detection & Escape — Monitors a sliding window of recent landmarks; if a cycle is detected, selects an escape target from the frontier
Subgoal Selection — Picks the next navigation target by scoring frontier nodes on novelty, visit count, and topological accessibility
Replay Batch — Provides prioritised experience replay batches for policy improvement

HierarchicalAgent

The top-level manager that coordinates task assignment, delegates subgoals to the Cartographer, and receives completion signals from workers.

The DAPSA Pattern

Rich Learning follows the Discover → Analyse → Plan → Select → Act cycle:

Discover — Observe raw state, encode as embedding
Analyse — Nearest-neighbour lookup; is this novel or known?
Plan — If novel: create landmark. If known: consider loop detection
Select — Choose next subgoal (frontier exploration vs. exploitation)
Act — Execute action, record transition, update graph

Data Model

StateLandmark (Node)

A sealed record representing a single mapped state in the topological graph:

StateLandmark
├── Id: string                  // Unique identifier
├── Embedding: double[]         // Vector representation of the state
├── VisitCount: int             // How many times the agent has visited
├── ValueEstimate: double       // EMA of cumulative reward
├── NoveltyScore: double        // Decays with visits (starts at 1.0)
├── UncertaintyScore: double    // Exploration signal
├── ClusterId: int              // Region assignment via label propagation
├── HierarchyLevel: int        // Position in abstraction hierarchy
├── ActionCounts: Dict          // Per-action selection histogram
├── PolicyEntropy: double       // Shannon entropy of action distribution
└── EpisodicTraces: List        // Recent action-reward sequences

StateTransition (Edge)

A sealed record representing a directed edge between two landmarks:

StateTransition
├── SourceId: string            // Origin landmark
├── TargetId: string            // Destination landmark
├── Action: int                 // Action that caused the transition
├── Reward: double              // Observed reward
├── RewardVariance: double      // Variance across observations
├── TransitionCount: int        // Number of times traversed
├── SuccessRate: double         // Proportion of successful transitions
├── Confidence: double          // Reliability estimate
├── TdError: double             // Temporal-difference error for replay
├── TemporalDistance: int       // Steps between source and target
├── IsMacroEdge: bool           // Multi-step abstract transition
└── MacroPath: List<string>     // Intermediate landmarks for macro edges

Why No Weights?

In a traditional neural network, knowledge is smeared across millions of weights. Learning task B overwrites the weights tuned for task A — catastrophic forgetting.

In Rich Learning, each task's knowledge lives in its own region of the graph. Task B adds new nodes and edges; task A's nodes remain untouched. Cluster assignment and frontier scoring adapt to the full graph, enabling automatic knowledge transfer without interference.

Exploration Strategies

Exploration behaviour is fully configurable through four strategy interfaces:

IFrontierScorer — Ranks frontier landmarks by novelty, visit count, and connectivity
INoveltyGate — Decides if an observation is novel enough to create a new landmark (distance threshold)
IPrioritySampler — Computes replay priority from TD-error, transition count, and staleness
ILoopEscapeStrategy — Chooses an escape target when a trajectory cycle is detected

Default implementations are provided, but you can inject custom strategies for domain-specific exploration.