02Lab · open notebook

The physics
of intelligence

The lab is where I question architectural defaults that became defaults by accident. Some lines are public, some internal, all of them live. Parallax is the main line; the rest orbit it.

L-001

State

Active · primary research line · v3

Tags

CWSAttractor dynamicsDEQ-flavouredEnergyContinual

Open page ↗

Parallax

A Central Workspace as a continuous-state dynamical system that settles into a metastable regime in an attractor landscape, perturbed by a heterogeneous pool of expert probes. Reasoning is settling. K loop is decoupled from the token loop. Candidate replacement for the residual stack itself.

Why

The whole project is a slow drift away from the assumption that cognition is the same thing as token-clocked autoregressive prediction. The CWS does K reasoning steps per context update, not per token. It does not see raw inputs - language is just one of several sensory channels. Trained with a DEQ-style contract (K-1 frozen iterations, one live final step), so reasoning depth costs no extra memory at training time. The bet: that you can get the dynamical-system flavour of cognition - settling, pondering, content-addressable basins - without giving up the throughput of modern transformers. Took three full iterations to figure out what the project actually is; v3 is where it started feeling honest.

Open questions

How sharp does the attractor regime need to be? Strange-attractor dynamics emerge from K-budget, not architecture - what is the right K policy?
Is K-DOF a graded reasoning-depth signal at language scale, or only a binary structural-recognition signal? Synthetic recall says graded; TinyStories at 6k steps says binary.
Can the 'replaces ResNet' framing survive 100M+ parameters and 1B+ tokens, or is the constant-memory-in-K finding only useful at small scale?
How do you write modality-specific motor decoders without sneaking cognition into them?

L-002

State

Active · own track

Tags

MLPStructural plasticityCompetition

Open page ↗

Dendritic Unit

A research aside. Take the activation function out of the MLP block and let the unit itself be nonlinear, the way biological dendrites are - through learned competition and coactivation between branches.

Why

An exploration into structural plasticity at the unit level. Real cortical neurons compute their nonlinearity inside the dendrites; deep learning bolts a separate ReLU on at the end. I wanted to see what happens if you trade the bolted-on nonlinearity for a learnable selection rule between affine branches. Started as a daydream, kept going because it keeps producing surprises. Not part of Parallax - lives on its own track.

Open questions

Does selection between branches give richer or more compositional structure than ReLU + linear at fixed parameter count?
How does the soft / hard temperature on the selection rule affect specialization vs coactivation in trained units?
Can branch specialization be read out as an interpretable factoring of the input space?

L-003

State

Internal · in-flight

Tags

AttentionLong contextArchitecture

Pi

A focused experiment exploring what kind of inductive bias could replace attention for a narrow class of long-context, structure-rich tasks.

Why

Attention pays a quadratic cost everywhere, but most long-context tasks have structure that pure attention does not exploit. Pi is a small, focused experiment - not a moonshot - asking what the right operator looks like when the structure is known.

Open questions

Where does attention overpay - what is the actual information geometry it is paying for?
What inductive bias replaces it without giving up the generality that made it useful?

L-004

State

Public · maintained

Tags

EvolutionBehaviorReal-time

On GitHub ↗

Neural Evolution

Self-driving agents in Unity, evolved from scratch. Custom NN, population-based training, configurable from the inspector. A playground for reading the geometry of behavior.

Why

Evolution is a low-bandwidth signal - exactly what makes it interesting. It forces architectural decisions to do real work, because there is no gradient to bail you out.

Open questions

How does network topology shape what behaviors are reachable?
What does a "fitness landscape" look like when the search space is architecture itself?

L-005

State

Public · supporting tooling

Tags

AgentsToolingInfrastructure

On GitHub ↗

conv-proxy

A conversational proxy with minimal intelligence by design - sits on top of agent frameworks like Openclaw without becoming the bottleneck.

Why

Agent stacks tend to grow more "intelligence" in places that should stay dumb. A thin, predictable proxy at the edges is more useful than a smart one.

Open questions

How do you keep a conversational layer inspectable while the agent framework changes underneath?

Want to argue with any of this

aiman@shabib.net

The physicsof intelligence

Parallax

Dendritic Unit

Pi

Neural Evolution

conv-proxy

The physics
of intelligence