Cognitive Architecture Overview

The Core Problem

Most AI systems are built as monoliths. A single model handles everything: understanding intent, reasoning about problems, taking actions, and evaluating results. This works for demos but breaks down in production systems that need reliability, debuggability, and graceful degradation.

The alternative is layered architecture with specialized engines: decomposing cognitive functions into distinct layers, where each layer contains purpose-built components for specific tasks. This pattern has a long history in cognitive science—from Newell's unified theories of cognition through modern architectures like ACT-R, Soar, CLARION, and LIDA.

The Layer Model

Drawing from decades of cognitive architecture research, a practical system separates concerns across four primary layers:

┌─────────────────────────────────────────────────────────────────┐
│ OVERSIGHT LAYER                                                 │
│  Monitoring, safety evaluation, metacognition, self-correction  │
├─────────────────────────────────────────────────────────────────┤
│ INTELLIGENCE LAYER                                              │
│  Planning, learning, multi-model coordination, uncertainty      │
├─────────────────────────────────────────────────────────────────┤
│ REASONING LAYER                                                 │
│  Generation, routing, validation, aggregation                   │
├─────────────────────────────────────────────────────────────────┤
│ EXECUTION LAYER                                                 │
│  Tool registry, orchestration, sandboxed runtime                │
└─────────────────────────────────────────────────────────────────┘

Each layer has distinct responsibilities. Information flows both up (results, observations, state) and down (goals, instructions, context). This mirrors the hierarchical control structures found in biological cognition and has proven effective in systems like Soar and LIDA.

Oversight Layer

The oversight layer provides meta-level monitoring and evaluation. Drawing from global workspace theory (as implemented in LIDA) and metacognitive research, this layer doesn't generate outputs directly—it monitors, evaluates, and guides lower layers.

Key Functions

Safety evaluation — Checking outputs against constraints before delivery
Quality monitoring — Detecting errors, inconsistencies, and degraded performance
Confidence calibration — Estimating uncertainty and knowing when to abstain
Self-correction — Triggering revision when problems are detected
Resource governance — Preventing runaway computation or infinite loops

Opinion: Oversight should be structurally separate from generation. When the same component both generates and evaluates outputs, it tends to approve its own work. This separation creates the kind of healthy tension that catches errors before they propagate.

Intelligence Layer

The intelligence layer handles complex reasoning that goes beyond simple generation. This maps to what ACT-R calls the "procedural module" and what Soar implements through its decision cycle and chunking mechanisms.

Key Functions

Multi-model coordination — Multiple models (or invocations) deliberating, critiquing, and synthesizing. Similar to ensemble methods but with structured interaction.
Planning — Goal decomposition, dependency analysis, plan generation. Breaking complex objectives into achievable steps.
Learning — Runtime adaptation from experience. Not weight updates, but structured learning that persists across sessions (similar to Soar's chunking).
Uncertainty quantification — Calibrated confidence estimates. Distinguishing epistemic uncertainty (lack of knowledge) from aleatoric uncertainty (inherent randomness).

Opinion: Multi-model deliberation is underrated. Single-model inference produces confident-sounding outputs even when wrong. Structured disagreement—models arguing positions and critiquing each other—surfaces errors that single inference misses.

Reasoning Layer

The reasoning layer is a pipeline: generate candidates, route to appropriate handlers, validate outputs, aggregate results. This corresponds to what FORR calls the "advisor" pattern—multiple specialized reasoners contributing to decisions.

Pipeline Components

Generator — Produces candidate outputs. May invoke multiple models or strategies in parallel.
Router — Directs requests to appropriate specialists. Pattern-based with learned routing preferences (similar to mixture-of-experts).
Validator — Checks outputs against constraints. Schema validation, safety checks, consistency verification.
Aggregator — Combines multiple outputs into coherent responses. Handles conflict resolution and synthesis.

Opinion: The reasoning layer should be as simple as possible while being as sophisticated as necessary. Most requests don't need multi-model deliberation or complex planning—they need fast, validated generation. The reasoning layer is the common path; upper layers are invoked selectively based on complexity and risk.

Execution Layer

Where the system touches the world. The execution layer is intentionally constrained—it can only do what it's explicitly allowed to do. This draws from principles of least privilege and defense in depth.

Components

Tool Registry — Catalog of available capabilities. Tools are registered with schemas, permissions, and constraints.
Orchestrator — Coordinates multi-tool operations. Handles sequencing, parallelism, and error propagation.
Sandboxed Runtime — Isolated execution environment. Strict resource limits, controlled access, ephemeral state.

Opinion: The execution layer should be maximally paranoid. Whitelists beat blacklists. If a capability isn't explicitly registered and approved, it doesn't exist. Default-deny is essential for systems that can take real-world actions.

Cross-Cutting Concerns

Memory

Memory isn't a layer—it's a capability that every layer needs. This aligns with how LIDA and ACT-R treat memory as multiple interacting systems (working memory, declarative memory, procedural memory) rather than a single store.

Opinion: Context windows are caches, not databases. Critical state—user preferences, task progress, accumulated knowledge—should live in durable storage with explicit read/write operations. Relying solely on context leads to forgotten information and inconsistent behavior.

Event-Driven Coordination

Layers communicate through events, not just function calls. This enables loose coupling while maintaining coordination—similar to how LIDA's "codelets" broadcast to a global workspace.

Lifecycle Management

Complex systems need explicit initialization and shutdown sequences. State needs to be persisted. In-flight operations need to complete or be cleanly cancelled. Resources need to be released in the right order.

Practical Trade-offs

Complexity vs. Capability

More layers and components means more coordination overhead. The architecture needs to justify its complexity through improved reliability, debuggability, or capability. Each component should exist because its absence caused problems.

Latency vs. Thoroughness

Full-stack processing—oversight evaluation, multi-model deliberation, validated generation, sandboxed execution—takes longer than a single model call. The architecture needs escape valves: fast paths for simple requests, tiered processing based on complexity and risk.

Explicitness vs. Emergence

This approach explicitly engineers cognitive functions rather than hoping they emerge from training. This is a bet: that explicit, inspectable systems are more reliable than emergent ones. The bet may be wrong, but at least it's testable.

What This Enables

Debuggability — When something goes wrong, you can trace through layers to find where it went wrong.
Testability — Components can be tested in isolation.
Gradual improvement — Upgrade one component without touching others.
Graceful degradation — If a component fails, the system can continue with reduced capability rather than complete failure.
Observability — Each component can emit telemetry. You can see how the system processes requests, not just what it outputs.

Open Questions

What's the right granularity? — When does splitting into more components help versus hurt? Too few and you lose benefits; too many and coordination dominates.
Should architecture be learned? — Current architectures are hand-designed. Could the structure itself be learned or evolved?
How do we validate oversight? — If the oversight layer approves an output, how do we know the approval is correct?
What's the right interface between layers? — Function calls, events, shared memory? Each has trade-offs.
How does this scale? — Does this architecture work for systems with hundreds of specialists, or do fundamentally different approaches become necessary?

Conclusion

Cognitive architecture is the art of decomposition: breaking complex behavior into pieces that can be built, tested, and improved independently. The specific architecture presented here—four layers with specialized components—is one approach drawing on decades of cognitive science research.

What matters more than the specific structure is the commitment to explicit design. If you can't explain how your system makes decisions, you can't trust those decisions. If you can't test components in isolation, you can't improve them reliably. Architecture is the foundation that makes everything else possible.

Cognitive Architecture Overview

The Core Problem

The Layer Model

Oversight Layer

Key Functions

Intelligence Layer

Key Functions

Reasoning Layer

Pipeline Components

Execution Layer

Components

Cross-Cutting Concerns

Memory

Event-Driven Coordination

Lifecycle Management

Practical Trade-offs

Complexity vs. Capability

Latency vs. Thoroughness

Explicitness vs. Emergence

What This Enables

Open Questions

Conclusion

Further Reading

Foundational Works

Modern Overviews

Key Survey Papers

Specific Architectures