Architecture

Rich Learning is built on a simple premise: knowledge is topology, not weights. Instead of encoding learned behaviour in neural network parameters, we store it as a directed property graph where nodes represent states and edges represent transitions.

Core Insight

Catastrophic forgetting happens because overwriting weights erases previous knowledge. A graph never forgets — adding a new node doesn't delete old ones.

Component Hierarchy

The system follows a three-layer hierarchy:

┌─────────────────────────────────┐
│       HierarchicalAgent         │  ← Manager: task selection
├─────────────────────────────────┤
│         Cartographer            │  ← Mid-level: planning & mapping
├─────────────────────────────────┤
│     TopologicalGraphMemory      │  ← Memory: graph CRUD & queries
│     (LiteDB or Neo4j backend)   │
└─────────────────────────────────┘

TopologicalGraphMemory (IGraphMemory)

The persistence layer. Stores StateLandmark nodes and StateTransition edges. Implements graph operations:

Two backends implement this interface: LiteDbGraphMemory (embedded, zero-config) and Neo4jGraphMemory (server-based, supports Cypher and graph visualisation).

Cartographer

The mid-level planner. Sits between raw observations and the memory backend. Responsibilities:

HierarchicalAgent

The top-level manager that coordinates task assignment, delegates subgoals to the Cartographer, and receives completion signals from workers.

The DAPSA Pattern

Rich Learning follows the Discover → Analyse → Plan → Select → Act cycle:

  1. Discover — Observe raw state, encode as embedding
  2. Analyse — Nearest-neighbour lookup; is this novel or known?
  3. Plan — If novel: create landmark. If known: consider loop detection
  4. Select — Choose next subgoal (frontier exploration vs. exploitation)
  5. Act — Execute action, record transition, update graph

Data Model

StateLandmark (Node)

A sealed record representing a single mapped state in the topological graph:

StateLandmark
├── Id: string                  // Unique identifier
├── Embedding: double[]         // Vector representation of the state
├── VisitCount: int             // How many times the agent has visited
├── ValueEstimate: double       // EMA of cumulative reward
├── NoveltyScore: double        // Decays with visits (starts at 1.0)
├── UncertaintyScore: double    // Exploration signal
├── ClusterId: int              // Region assignment via label propagation
├── HierarchyLevel: int        // Position in abstraction hierarchy
├── ActionCounts: Dict          // Per-action selection histogram
├── PolicyEntropy: double       // Shannon entropy of action distribution
└── EpisodicTraces: List        // Recent action-reward sequences

StateTransition (Edge)

A sealed record representing a directed edge between two landmarks:

StateTransition
├── SourceId: string            // Origin landmark
├── TargetId: string            // Destination landmark
├── Action: int                 // Action that caused the transition
├── Reward: double              // Observed reward
├── RewardVariance: double      // Variance across observations
├── TransitionCount: int        // Number of times traversed
├── SuccessRate: double         // Proportion of successful transitions
├── Confidence: double          // Reliability estimate
├── TdError: double             // Temporal-difference error for replay
├── TemporalDistance: int       // Steps between source and target
├── IsMacroEdge: bool           // Multi-step abstract transition
└── MacroPath: List<string>     // Intermediate landmarks for macro edges

Why No Weights?

In a traditional neural network, knowledge is smeared across millions of weights. Learning task B overwrites the weights tuned for task A — catastrophic forgetting.

In Rich Learning, each task's knowledge lives in its own region of the graph. Task B adds new nodes and edges; task A's nodes remain untouched. Cluster assignment and frontier scoring adapt to the full graph, enabling automatic knowledge transfer without interference.

Exploration Strategies

Exploration behaviour is fully configurable through four strategy interfaces:

Default implementations are provided, but you can inject custom strategies for domain-specific exploration.