Architecture
Rich Learning is built on a simple premise: knowledge is topology, not weights. Instead of encoding learned behaviour in neural network parameters, we store it as a directed property graph where nodes represent states and edges represent transitions.
Core Insight
Catastrophic forgetting happens because overwriting weights erases previous knowledge. A graph never forgets — adding a new node doesn't delete old ones.
Component Hierarchy
The system follows a three-layer hierarchy:
┌─────────────────────────────────┐
│ HierarchicalAgent │ ← Manager: task selection
├─────────────────────────────────┤
│ Cartographer │ ← Mid-level: planning & mapping
├─────────────────────────────────┤
│ TopologicalGraphMemory │ ← Memory: graph CRUD & queries
│ (LiteDB or Neo4j backend) │
└─────────────────────────────────┘ TopologicalGraphMemory (IGraphMemory)
The persistence layer. Stores StateLandmark nodes and StateTransition
edges. Implements graph operations:
- CRUD — Upsert and retrieve landmarks and transitions
- Nearest Neighbour — Find the closest landmark to a new observation (cosine distance)
- Shortest Path — BFS/Cypher-based path finding between landmarks
- Cycle Detection — Identify loops in the agent's recent trajectory
- Frontier Discovery — Find under-explored landmarks at the boundary of known space
- Prioritised Replay — Sample transitions weighted by TD-error and staleness
- Cluster Assignment — Label propagation for discovering state-space regions
Two backends implement this interface: LiteDbGraphMemory (embedded, zero-config)
and Neo4jGraphMemory (server-based, supports Cypher and graph visualisation).
Cartographer
The mid-level planner. Sits between raw observations and the memory backend. Responsibilities:
- State Observation — Encodes raw states via
IStateEncoder, decides whether to create a new landmark or update an existing one using a novelty threshold - Transition Recording — Logs transitions with reward, success rate, and temporal distance
- Loop Detection & Escape — Monitors a sliding window of recent landmarks; if a cycle is detected, selects an escape target from the frontier
- Subgoal Selection — Picks the next navigation target by scoring frontier nodes on novelty, visit count, and topological accessibility
- Replay Batch — Provides prioritised experience replay batches for policy improvement
HierarchicalAgent
The top-level manager that coordinates task assignment, delegates subgoals to the Cartographer, and receives completion signals from workers.
The DAPSA Pattern
Rich Learning follows the Discover → Analyse → Plan → Select → Act cycle:
- Discover — Observe raw state, encode as embedding
- Analyse — Nearest-neighbour lookup; is this novel or known?
- Plan — If novel: create landmark. If known: consider loop detection
- Select — Choose next subgoal (frontier exploration vs. exploitation)
- Act — Execute action, record transition, update graph
Data Model
StateLandmark (Node)
A sealed record representing a single mapped state in the topological graph:
StateLandmark
├── Id: string // Unique identifier
├── Embedding: double[] // Vector representation of the state
├── VisitCount: int // How many times the agent has visited
├── ValueEstimate: double // EMA of cumulative reward
├── NoveltyScore: double // Decays with visits (starts at 1.0)
├── UncertaintyScore: double // Exploration signal
├── ClusterId: int // Region assignment via label propagation
├── HierarchyLevel: int // Position in abstraction hierarchy
├── ActionCounts: Dict // Per-action selection histogram
├── PolicyEntropy: double // Shannon entropy of action distribution
└── EpisodicTraces: List // Recent action-reward sequences StateTransition (Edge)
A sealed record representing a directed edge between two landmarks:
StateTransition
├── SourceId: string // Origin landmark
├── TargetId: string // Destination landmark
├── Action: int // Action that caused the transition
├── Reward: double // Observed reward
├── RewardVariance: double // Variance across observations
├── TransitionCount: int // Number of times traversed
├── SuccessRate: double // Proportion of successful transitions
├── Confidence: double // Reliability estimate
├── TdError: double // Temporal-difference error for replay
├── TemporalDistance: int // Steps between source and target
├── IsMacroEdge: bool // Multi-step abstract transition
└── MacroPath: List<string> // Intermediate landmarks for macro edges Why No Weights?
In a traditional neural network, knowledge is smeared across millions of weights. Learning task B overwrites the weights tuned for task A — catastrophic forgetting.
In Rich Learning, each task's knowledge lives in its own region of the graph. Task B adds new nodes and edges; task A's nodes remain untouched. Cluster assignment and frontier scoring adapt to the full graph, enabling automatic knowledge transfer without interference.
Exploration Strategies
Exploration behaviour is fully configurable through four strategy interfaces:
- IFrontierScorer — Ranks frontier landmarks by novelty, visit count, and connectivity
- INoveltyGate — Decides if an observation is novel enough to create a new landmark (distance threshold)
- IPrioritySampler — Computes replay priority from TD-error, transition count, and staleness
- ILoopEscapeStrategy — Chooses an escape target when a trajectory cycle is detected
Default implementations are provided, but you can inject custom strategies for domain-specific exploration.