← Selected workR-002/Dendritic Unit

Dendritic Unit

A research aside. What if you took the activation function out of the MLP and let the unit itself be nonlinear, the way a biological neuron is - by virtue of how its dendrites compete and combine.

GitHub ↗

PyTorchCUDATransformers

Where this came from

This started as a daydream. I had been reading bits of cellular neuroscience - the actual computation that goes on inside a cortical neuron, before any signal reaches the soma - and noticed the picture did not match anything we do in deep learning. Real neurons are nonlinear because of their dendrites. Branches compete, branches coactivate, and the structure of those interactions is what makes the unit expressive. There is no tiny ReLU bolted on at the end.

I am not making any biological-plausibility claims. ML stops caring about biology the moment biology stops being useful. But the existence of a working alternative was enough to make me wonder what would happen if you tried it. That is the whole motivation. It is exploration, not a thesis.

What the unit actually does

Each dendritic unit replaces an MLP block. Inside, instead of a single nonlinearity sandwiched between two linear maps, there is a small set of branches. Each branch is its own affine. They compete via a learnable selection rule, soft enough that several branches can coactivate when their evidence agrees, sharp enough that one branch can dominate when the input clearly belongs to its territory.

The output is a weighted combination of the branches, with weights that themselves depend on the input. That is where the nonlinearity lives - in the structure of the selection, not in a separate function applied at the end.

[ FIG · 03 / Dendritic vs MLP ]

Where the nonlinearity lives

Fig 01 - Where the nonlinearity lives. ReLU sits between two linears; the dendritic unit moves the nonlinearity into the selection rule itself.

[ FIG · 02 / Dendritic Unit ]

activation-free

Soma · output

Dendrite branch

Synapse · input

Active branch

Fig 02 - One unit, mid-step. Inputs arrive at the synapse tips, propagate along branches, compete; the active branch (gold) is the one currently winning the selection.

What I am calling structural plasticity

Because the selection rule is differentiable, the unit can learn how sharp or how soft to be, per layer or even per channel. What I have been seeing in small runs is uneven branch usage that looks a lot like specialization. Some branches end up handling clean clusters of inputs; others fade away; a few pick up edge cases. The unit has, in effect, a learnable internal shape, not just learnable weights.

That is the bit I find interesting. The capacity allocation is something the unit decides, not something I bake in. Whether that translates into anything useful at scale is a question I cannot answer yet - the experiments are still small. But I keep coming back to it.

What this is not

The dendritic unit is not connected to the bigger lab work I am doing on Parallax. It started before Parallax and lives on its own track. The two share aesthetic - I am drawn to architectures where structure carries more weight than scale - but they are different bets, and one does not need the other to be interesting.

Treat this page like a research log. There is a real chance the answer is "this is a cute primitive that does not compound". I think that is fine. ML history is full of those, and most of the good ideas had a dozen of them as predecessors.

Want to argue with any of this

aiman@shabib.net

Back to work