TL;DR
- Bet: stateful, path-dependent routing on a recurrent / SSM substrate with content-addressed memory, an original mechanism, composed with proven tool-classes.
- Why: constant per-step compute, unbounded context, event-driven efficiency that fits European compute. A new scaling axis, not a faster transformer.
- Proof today: spiking SNN 84-87% on SHD; stateful vs stateless routing 99% vs <70%
[TOY]; bit-exact GPU kernels 1.7-3.4× vs compiled dense; ~11 prior-art results validated/falsified in 19 days. - Central open question: does a routed/sparse model match dense accuracy?
[NEGATIVE today](SHD collapse), with a diagnosis + fix plan. - Ask: €3M / 7 months; compute ~10-20% of budget; demonstrators to ~13B, no brute-force run.
The hypothesis (story spine)
- All biological intelligence is inherently recurrent. In the brain, memory, thinking, and reasoning live in one shared recurrent substrate, running at extreme sparsity, which makes it both extremely efficient and highly complex.
- In the digital realm we have different advantages to leverage. We borrow the principle, recurrence, from biology, then play to silicon's strengths: massively parallel hardware, structured sparsity that maps onto GPUs and Tensor Cores, flexible precision, fast memory, and training at scale.
- The next frontier model is recurrent. Transformers won the current S-curve by paying quadratic attention over a growing context window, cost scales with whoever burns the most H100s. State-based recurrent networks carry a fixed-size evolving state: constant compute/memory per step, unbounded context, and memory in the substrate. This targets the next S-curve, not a faster ride on the current one.
- It comes from many sides, existing research and new ideas alike. We build on a wide body of existing research and conduct our own: combining the best proven components and inventing new concepts and mechanisms as we go. Benchmarking and falsification are the discipline that validates every claim, ours and the field's.
- Our bet, concretely: the missing piece is an original mechanism we are developing, stateful, path-dependent routing on a state-based substrate with content-addressed memory, which today's stateless SSM-MoE models do not have. The open question = the research: which recurrent architecture, built around that mechanism and the best existing components, first reaches frontier-grade capability under European efficiency constraints. We answer it through original research plus rigorous benchmarking and falsification, the disciplined search is how we de-risk and reach the bet fast, not a substitute for it.
- Today's AI helps us discover the next frontier's AI. The research loop is AI-augmented end to end, custom research agents and autoresearch, rapid small-model prototyping, and bespoke training-efficiency test tooling, letting a small team read, build, test, falsify and invent quickly. The team works from Merantix AI Campus and Antler.
- Existing modules (attention/SSM/MLP) are used where they help. Our spiking/kernel work is a first probe and realizability evidence.
Momentum: research velocity (grounded)
- A small team, together a few weeks, using an AI-augmented build → benchmark (vs
torch.compile) → profile → falsify → literature-check loop, independently confirmed / rediscovered / falsified ≈11 published results across GPU kernels, SNN neuroscience, spatial connectomics, and MoE routing, 11–30 May 2026 (~19 days). ~4 of 11 were falsifications of field over-claims.
| # | What we measured | Published prior art | Status |
|---|---|---|---|
| 1 | Fine sparse connectivity ≈0× GPU speedup; only block-structure wins | Sparsity Roofline (Gale 2023) | re-derived |
| 2 | Naive sparse SNN kernels lose to compiled dense; only tiled split wins | FlashLLM, SparseRT | re-derived |
| 3 | SpectralAI RT-core "113–218×" → ≤2× vs a compiled baseline | SpectralAI preprint | [NEGATIVE] falsified |
| 4 | 3D wins SHD / 2D wins Yin-Yang (different mechanism) | SpSNN (Landsmeer 2025) | reproduced |
| 5 | Random sparse ≥ dense in SNN+SHD (+2pp); structured spatial +1pp | Random Pruning (Liu 2022) | new domain |
| 6 | Bare LIF + surrogate fails long-memory; richer cells recover it | ELM (Spieler 2023) | rediscovered |
| 7 | Router state is the critical routing axis | RMoE (Qiu 2024) | rediscovered |
| 8 | Stateful router 99% vs stateless 70% on cue-switch | RIMs (Goyal 2019), Routing-Mamba | [TOY] |
| 9 | One-shot pruning collapses SNN; needs sparse-aware retrain | Lottery Ticket, RigL | re-derived |
| 10 | Spike sets NOT temporally stable → killed a compression path | (literature silent) | [NEGATIVE] |
| 11 | LIF dynamics <1% of layer time → the matvec dominates | (implicit assumption) | [NEGATIVE] |
Technical fields
Project Title
- candidates:
State-based RNNs: the next frontier·The recurrent frontier, proven tools, assembled·RNNs that scale with state
Short Description
- Hypothesis: state-based recurrent networks are the next frontier, constant per-step compute (no context window), state-as-memory, and efficiency that fits European compute.
- The bet: an original mechanism we are developing, stateful, path-dependent routing on a state-based substrate with content-addressed memory, composed with proven tool-classes into a new recurrent architecture; the best design is found and validated through rigorous benchmarking and falsification on efficiency × performance × unbounded context.
- For: real-time and streaming AI, always-on edge and perception, infinite-context assistants, and frontier-scale language / code / reasoning (see Use Cases).
- Grounding: ~11 prior-art results validated/falsified in ~19 days; a working spiking prototype + GPU kernels.
Frontier Dimension
- Model architecture (in-scope: state-space / alternative architectures).
- Hypothesis: a state-based recurrent network opens a scaling axis transformers lack, per-step compute need not grow with model size, and context is unbounded (fixed evolving state, no re-attention).
- Built by combining proven components with new mechanisms we research; existing optimised modules used where they help.
Core Idea & Architecture
- Hypothesis (substrate): recurrence is the right substrate, a fixed-size state evolves per timestep → streaming-native, constant compute/memory per step, memory implicit in the state (Mamba, S4/S5, xLSTM, HiPPO).
- Hypothesis (composition): the architecture is built from classes of proven tools, extended with new mechanisms of our own, composed where each earns its place. The research space we explore:
- Recurrent substrate: selective SSM / spiking / xLSTM-style cells.
- Conditional computation & routing: activate few blocks per step (Sparsely-Gated MoE, Switch), building on routing-by-agreement from capsule networks (Dynamic Routing Between Capsules, Sabour, Frosst & Hinton 2017); incl. stateful / path-dependent routing as one candidate (vs stateless MoE-Mamba/BlackMamba).
- Expressive recurrent neurons: cells richer than a bare leaky integrator (multi-timescale / expressive-neuron class).
- Recursive / hierarchical reasoning: depth decoupled from parameters (GRAM, HRM, TRM).
- Structured sparsity / efficiency: GPU-exploitable, block-structured sparsity.
- Existing optimised modules: attention / MLP, used where they help (hybrid precedent: Jamba, Nemotron-H, Nemotron 3 Super (hybrid + LatentMoE)).
- Illustrative candidate architecture (one concrete point in the search space, not pre-committed): a selective-SSM / Mamba backbone carries the recurrent state; each step a stateful router reads that state (hidden + adaptation variables, not just the output) and activates a few expressive, multi-timescale blocks from a larger pool; an optional recursive-reasoning core (HRM / TRM) adds depth without adding parameters; the pool is block-structured so active-only compute maps onto dense GPU matmuls. Compute scales with active blocks per step, not model size; memory stays a fixed-size state. One instance to make the idea tangible; the research decides the final shape.
- Which components, which new mechanisms, and how they combine is what the research develops and validates (see The Research).
Technical Novelty
- The core novelty (our bet): stateful, path-dependent routing on a state-based substrate, the active sub-network is chosen from the accumulated recurrent state, not the current token, so the model switches active blocks mid-sequence as context changes, and the routed state doubles as a content-addressed memory. The new rebinding: the multi-timescale primitive that ELM/S4/Mamba use inside a unit becomes the gating signal between blocks. Toy evidence: 99% (stateful) vs <70% (stateless) on a cue-switch task
[TOY]. - What current recurrent models miss: hybrids (Nemotron-H, Jamba) still keep attention and route stateless / token-wise (MoE-Mamba, BlackMamba, Routing-Mamba, Swimba); pure SSMs carry a documented recall gap (StateX: 64K-needle 26%→42%). Our delta is to route on the state and address memory by content, aimed squarely at that recall gap.
- Why this is reachable, not reproduction: architecture is already out-scaling parameters, GRAM/TRM (~7–10M params) beat a 671B model on reasoning and Nemotron-H beats Qwen-2.5-72B; and our Momentum (11 prior-art results validated/falsified in ~19 days) shows we can search the design space and invent fast. The innovation is the new mechanism found via disciplined search + invention, not the assembly itself.
- Efficiency as a paradigm, not a tweak: event-driven recurrence + GPU-exploitable structured sparsity define a fundamentally different compute model.
- Open hypothesis (to prove/falsify): sub-quadratic compute-vs-sequence-length for the routed/sparse recurrent model, empirical only, no formal claim.
Technical Novelty Citation
- Recurrent/SSM: Mamba, Mamba-2, S4, S5, xLSTM, HiPPO, Active Tuning/Otte.
- Conditional computation & routing-by-agreement: Dynamic Routing Between Capsules, Sabour, Frosst & Hinton 2017, Sparsely-Gated MoE, Switch, MoE-Mamba, BlackMamba, Routing-Mamba, Swimba, RMoE, RIMs, σ-MoE/Csordás.
- Expressive neurons: ELM, Scaling-Laws-Recurrent-Expressive-Neurons.
- Recursive reasoning / test-time memory: GRAM, HRM, TRM, Titans.
- Spiking basis / neuromorphic delineation: LSNN, ALIF/Yin, e-prop, SpikingBrain; Loihi2, SpiNNaker2, Tianjic.
- Hybrid at scale: Jamba, Nemotron-H, Nemotron 3.
Capability Gap Addressed
- Infinite / no context window, a fixed evolving state with no re-attention and no quadratic blow-up; a stream can run "always-on", addressing a gap that remains open at any transformer scale.
- Real-time / low-latency & edge, constant per-step compute + energy fits streaming audio/sensor/video and European edge deployment.
- Hypothesised capability (to test): path-dependent computation, switching the active sub-network mid-sequence as context changes (
[TOY]evidence: 99% vs 70%). - Reasoning under tiny budgets, recursive cores (GRAM: ~10M params beat a 671B model on constraint reasoning) → capability emerging from architecture itself, at tiny parameter counts.
Use Cases: where these models, and only these models, fit
[PRIOR-ART] external deployments, grouped by the structural reason recurrence is required. Full landscape (117 use cases, 84 companies): the Use Cases & Applications page.
- Five structural reasons a transformer cannot follow here: a hard real-time latency floor (sub-millisecond control), a microwatt always-on energy budget, unbounded context that must run on bounded state, sparse event streams where compute should track activity, and continuous on-device adaptation. Each disqualifies quadratic attention and a growing KV-cache by construction, not by tuning.
- Frontier scale, language / code / reasoning: Nemotron-H (NVIDIA, ~92% attention replaced by Mamba-2) beats Qwen-2.5-72B on 9 of 17 tasks at up to 3× throughput; Jamba 1.5 (AI21) serves 256K context at ~9 GB KV-cache, an order of magnitude below a dense Transformer; Codestral Mamba (Mistral) gives 256K linear-time code context; RWKV-7, Falcon-Mamba (TII), and xLSTM-7B (NXAI, EU) show the same at 7B.
- Long / infinite context: the KV-cache wall is now an explicit pricing signal (Gemini 2× above 200K tokens); Titans (Google) reaches >2M effective context via test-time memory with no quadratic attention.
- Local / on-device AI: Jamba Reasoning 3B runs 250K context on a MacBook; RWKV ships on Android/iOS at constant memory; SSM decode is matrix-vector, NPU-friendly (XAMBA, up to 4.8× on Intel/Qualcomm/AMD). On-device inference is the structural answer to GDPR and the EU AI Act, data never leaves the device.
- Always-on edge: wake-word and sensing at 280 µW (Syntiant), Loihi predictive maintenance at 0.0032 J/inference vs 11.3 J on x86 (arXiv); EU silicon: Innatera, SynSense, BrainChip Akida.
- Real-time control and the perception layer: sub-millisecond servo control (EdgeDRNN), legged-robot policies where xLSTM beats Transformers on latency (LRAM, ICML 2025), event-camera vision at microsecond resolution (Prophesee IMX636 with Sony, iniVation, EU).
- Streaming and the daily data deluge: packet-stream intrusion detection 60× faster than a Transformer baseline (NetMamba); triggerless particle-physics readout at 40 MHz; onboard satellite triage under 1 W.
- Stateful human-AI interaction: persistent-memory assistants (Microsoft Copilot Memory GA 2025, Mem0, Letta/MemGPT); real-time duplex voice (Gemini Flash Live); the relationship, not a re-injected transcript, is the state.
- Bio and health, a standout
[PRIOR-ART]cluster: FDA- and CE-approved adaptive deep brain stimulation (Medtronic, 2025); 14-day ECG patches at >1.1M patients (iRhythm); BCI motor decoders (Neuralink, Synchron); EEG seizure detection at <375 µW; a Mamba EEG model on a RISC-V MCU at 27× fewer FLOPs (FEMBA). DARPA INTACT ($120M) funds Intel, IBM, SynSense for exactly this SWaP-constrained niche. - Why now: inference is 80 to 90% of AI energy and AI data-center power is set to roughly double to ~945 TWh by 2030 (IEA); EU electricity runs ~2× the US, taxing every quadratic token (CNBC). Europe holds a deep neuromorphic bench (Innatera, SynSense, Prophesee, iniVation, SpiNNcloud, NXAI), and Horizon Europe put €1.5B into neuromorphic research in 2025.
Existing Artifacts
- Spiking SNN on SHD
[PROVEN]: 84–87% test accuracy at 20 epochs (ref ~90% at 150 epochs). - Stateful-routing toy
[TOY]: 99% (stateful) vs <70% (stateless) on a 13-step cue-switch unit test; end-to-end not yet run. - GPU kernels (all graded vs
torch.compiledense; speed and accuracy reported separately):- Accuracy-preserving (bit-exact, identical outputs)
[PROVEN: speed at equal accuracy]: spike-pool ("activation pool") kernel, non-routed decode 1.73–1.84× per-step, 2.86–3.10× sustained; multi-timestep tensor-core kernel 3.43× (peak 3.67×). Both parity-tested vs the dense reference (max_abs = 0.0; 33/33 and 25/25 tests pass), so the speedup changes nothing numerically. - Speed-only, accuracy not yet established
[PROVEN speed / accuracy open]: block-sparse variant 26.8× at 99% sparsity is a kernel result gated on a sparsity-aware retrain (the current SHD checkpoint tolerates only ~50% unstructured pruning); accuracy at that sparsity is unproven. Plus a trainable stateful-routing kernel and a pre-inference matrix-shuffle pruning kernel.
- Accuracy-preserving (bit-exact, identical outputs)
- ~26× spiking inference speedup
[PROVEN, binary-specific]via a look-up-table over (active block × spike pattern); the LUT reproduces the spiking matvec exactly, so it is a speedup at identical outputs. - The ~11-item validated-vs-prior-art table above, evidence of R&D velocity across the field.
- Live artifact dashboard
[PROVEN]: interactive benchmarks and visualisations of the results above, at routing-snn-dashboard.pages.dev, with the kernel benchmarks (per-kernel speed vs compiled dense) and the spatial-SNN ablation viewer (accuracy/sparsity across configs and seeds). Backed by versioned benchmark JSONs in the repo.
Open Research Questions / Risks: these are the research
- Can a routed/sparse recurrent net match a dense model's accuracy? The central question.
[NEGATIVE today]: a routed SNN collapsed on SHD (V1 38.6% / B1 51.8% vs dense 85.1%). Likely cause: the router decides on binary spike output, so its gradient crosses a surrogate-gradient (Heaviside) boundary and trains poorly. Fix hypotheses we will test: (i) route on the continuous membrane potential + adaptation state (real gradients, anticipates spikes), (ii) richer block cells than bare LIF (ELM / TC-LIF / PMSN), known to recover long-memory, (iii) key/value separation (Csordás). Scaling-Laws for Recurrent Expressive Neurons trains an analogous SNN/router bridge on SHD-Adding, evidence the bridge is crossable. - Does the inference efficiency transfer to training and to continuous (non-spiking) substrates? Open: training-time wall-clock speedup unproven
[NEGATIVE today]; the 26× LUT is binary-specific[PROJECTED]for continuous SSMs. - Is the compute genuinely sub-quadratic in sequence length? Open, empirical only, no formal proof.
- Does small-scale behaviour survive scale-up? Open, to be checked with scaling curves.
TRL Assessment
- Experimental research stage. Validated artifacts exist (GPU kernels parity-tested + benchmarked vs compiled dense; spiking SNN trained on SHD; routing toy), see Existing Artifacts. Formal overall/sub-component TRL mapping: team to assign (next pass), not estimated here.
Compute Requirements
- Stage-1 estimate
[PROJECTED]: validation plus demonstrators to ~13B ≈ 100–180k H100-hours, about 32–64 H100 running continuously over the 7 months. - Cost ≈ €0.3–0.6M at competitive providers, only ~10–20% of the €3M envelope: compute is not the bottleneck, personnel and engineering are. Budget a ~2–3× reserve for failed runs.
- No brute-force run: a 70B-from-scratch run (~€1–1.5M alone) is excluded, on cost and because SPRIND explicitly rules out brute-force scaling. The credible path to frontier is shown via scaling curves to ~13B.
- GPU type / source: H100-class capacity on competitive cloud / EU providers (exact source to confirm).
KPIs / Benchmarks: measurement axes for the search
- Accuracy parity routed/sparse vs dense at matched FLOPs.
- Latency + throughput (tokens/s decode) vs compiled-dense Transformer & SSM baselines.
- Energy / token; max context length / streaming stability (constant memory over long streams).
- Active-block count / sparsity; compute-vs-sequence-length curve (test sub-quadratic hypothesis).
- (Specific benchmark suite = chosen during the research, on standard long-context / associative-recall / streaming tasks.)
The Research: what we will actually do
- A research programme spanning three intertwined modes:
- Build on the field, study and adapt a wide body of existing research (SSMs, conditional computation, routing / capsule networks, expressive neurons, recursive reasoning, sparsity) as starting components.
- Original research, develop new mechanisms and concepts the field doesn't yet have (state-/path-dependent computation, new recurrent cells, memory and routing schemes, efficient sparse compute) where existing tools fall short.
- Rigorous empirical science, validate every candidate, existing or new, by adversarial benchmarking and falsification: implement → benchmark on the triad vs a compiled baseline → profile the real bottleneck → try to falsify → keep only what survives, record the rest as negative results.
- All accelerated by custom-built AI research tooling, custom research agents / autoresearch, automated experimentation and benchmarking, rapid small-model prototyping, and training-efficiency test tools; the AI-augmented loop that produced the ~11 validated results in Momentum.
- Selection criterion = efficiency × performance × unbounded context. The resulting architecture, the new methods we publish, and the ruled-out dead ends are the outputs (Stage-1 technical report / preprint + experimental codebase), kept open until the evidence decides.
Team
- Why we are best suited: we already pursue this thesis (sparsity + recurrence for efficient scale) in published research and shipped production systems, and the core covers research, GPU / kernel engineering, systems, IP, and commercialisation, with a real entrepreneurial track record and fast execution. Networks: MIT, Numenta, Mercedes-Benz, Merantix AI Campus, Uni Lübeck, Bitkom, Antler.
- Core team:
- Geoffrey Kasenbacher — spiking / sparse nets and neuromorphic ML at Mercedes-Benz (~1000× energy and ~70× runtime in a shipped S-Class prototype); 21 granted patents; peer-reviewed (WARP-LCA). (GitHub)
- Tebjan Halm — 20 years of real-time systems; built a compiler and runtime from scratch; optimised NVIDIA SANA to ~250 ms/image, 2–3× faster than NVIDIA's own SANA-Sprint at ~6× less compute; state-centric, non-transformer architectures. (tebjan.de, GitHub)
- Jana Lehner (joining core, to confirm) — physics PhD; Director of IP and ex-CBO at a quantum deep-tech scale-up; 19 years at IBM; Bitkom board; owns the IP moat and commercial / partnerships.
- Mirko Klukas (in discussion) — math PhD; sparse-coding and sequence-memory research at MIT (Quest for Intelligence / Probabilistic Computing) and Numenta. (blog)
- Advisor:
- Sebastian Otte — professor at Uni Lübeck and Geoffrey's PhD supervisor; researches state-based / recurrent models (Active Tuning, recurrent spiking control); strengthens the team's research credibility.
- Research support:
- Johann Machemer — DNN pruning research (Calprune plus an open-source pruning framework); 2 peer-reviewed papers and FLAIRS Best Student Paper; at Uni Lübeck, a direct bridge to Otte. Limited availability for ~2-3 weeks pending a thesis defense. (GitHub)
- Pending (decision in ~2-3 weeks, post-defense):
- Christian-Hauke Poensgen — lawyer, product engineer and founder (raised €1.2M; 6+ years applied-AI B2B SaaS); fundraising, management, operations, EU / legal depth. (LinkedIn)
- What is missing, and how we close it: the core is deliberately lean; we deepen hands-on coverage of specific methods and models with targeted senior research and engineering hires as funding grows, drawing on the networks above. CVs attached for key people.
Financial Cost Estimate
- Main cost drivers (draft): personnel and engineering are the bulk; compute ~10–20% (~€0.3–0.6M, see Compute Requirements); plus a reserve for failed runs (~2–3× on the compute line). Detailed line items, salaries, and any subcontracts to finalise in the financial pass.
References
Recurrent / SSM substrate: Mamba (Gu & Dao 2023) · Mamba-2 (2024) · S4 (Gu 2021) · S5 (Smith 2023) · xLSTM (Beck 2024) · HiPPO (Gu 2020) · Active Tuning (Otte 2020) · StateX (2025)
Conditional computation / routing: Dynamic Routing Between Capsules, Sabour, Frosst & Hinton (2017) · Sparsely-Gated MoE (2017) · Switch (2021) · MoE-Mamba (2024) · BlackMamba (2024) · Routing-Mamba (2025) · Swimba (2026) · RMoE (Qiu 2024) · RIMs (Goyal 2019) · σ-MoE (Csordás 2023) · SwitchHead (Csordás 2023)
Expressive / multi-timescale neurons: ELM (Spieler 2023) · Scaling Laws for Recurrent Expressive Neurons (2026)
Recursive reasoning / test-time memory: GRAM (Baek 2026) · HRM (2025) · TRM (2025) · Titans (Behrouz 2025)
Spiking basis + neuromorphic: LSNN (2018) · ALIF/Yin (2021) · e-prop (2019) · SpikingBrain (2025) · SHD dataset (2019) · Recurrent spiking robot control (Traub & Otte 2021) · Loihi 2 (2021) · SpiNNaker2 (2021) · Tianjic (2019)
Hybrid at scale: Jamba (2024) · Jamba-1.5 (2024) · Nemotron-H (2025) · Nemotron 3 (2025) · Nemotron 3 Super, LatentMoE (2025) · Nemotron Nano 2 (2025)
GPU sparse kernels / efficiency limits: FlashLLM (2023) · SparseRT (2020) · FlashSparse (2024) · SparStencil (2025) · Sparsity Roofline (Gale 2023)
Pruning / sparse training: Lottery Ticket (2018) · RigL (2020) · Random Pruning (Liu 2022)
Spatial connectivity (probe): SpSNN (Landsmeer 2025)
Capability probe: MQAR / Zoology (Arora 2023)
Changelog
2026-05-31 15:51 CEST — Leaner pages, argument-first use cases
- Re-sorted [use cases]: the use-cases page is now organised by the 6 structural arguments (latency floor, µW always-on, unbounded context, sparse events, on-device adaptation, efficiency at scale) + a bio cross-cutting section, with an at-a-glance table.
- Removed [interpretation]: the editorial layer (superlatives, metaphors, "how this maps", mechanism narration); kept all use cases, products, models, numbers, and structural arguments, each with a source link.
- Added [proposal]: a one-screen TL;DR at the top (bet / why / proof / open question / ask).
2026-05-31 14:34 CEST — Team constellation update
- Updated [team]: core team is now Geoffrey Kasenbacher, Tebjan Halm, and Jana Lehner (joining, to confirm), with Mirko Klukas in discussion.
- Added [team]: Sebastian Otte as Advisor (professor at Uni Lübeck, Geoffrey's PhD supervisor, state-based / recurrent-models researcher).
- Reframed [team]: Johann Machemer moves to Research support (Uni Lübeck, direct bridge to Otte; limited availability ~2-3 weeks pending a thesis defense).
- Pending [team]: Christian-Hauke Poensgen not yet committed (decision in ~2-3 weeks, post-defense).
- Removed [team]: Paula Wohlgemuth (role overlap with Jana now in the core).
2026-05-31 12:51 CEST — Addressed Johann's review
- Pivoted [vision / novelty]: the proposal now leads with the concrete bet (stateful, path-dependent routing on a state-based substrate + content-addressed memory); the search / benchmark / falsify loop is reframed as the method to reach it, not the product. Counters the "this reads as reproduction" risk.
- Added [novelty]: a "why this is reachable, not reproduction" signal set, architecture-beats-scale (GRAM/TRM beat 671B; Nemotron-H beats Qwen-2.5-72B) and our 19-day Momentum.
- Added [comparison]: a "what current recurrent models miss" point, hybrids keep attention and route stateless / token-wise, pure SSMs have a recall gap (StateX); our delta is routing on the state + content-addressing.
- Added [architecture]: an illustrative candidate architecture (SSM backbone + state-reading router + expressive blocks + optional recursive core + block sparsity), marked non-committal, to make the idea tangible.
- Evidence added [SHD collapse]: the routed-SNN collapse (V1 38.6% / B1 51.8% vs dense 85.1%) now carries a likely cause (binary-spike routing across a surrogate-gradient boundary) and concrete fix hypotheses (membrane-potential routing, richer cells, key/value separation).
- Sharpened [artifacts]: kernel claims split into bit-exact speed-at-equal-accuracy (spike-pool, tensor-core; parity max_abs = 0.0) vs speed-only (block-sparse 26.8×, retrain-gated, accuracy unproven). Answers "speedup is meaningless without accuracy".
- Surfaced [use cases]: a "For" line near the top points to the Use Cases section.
Tags: Added · Removed · Sharpened · Pivoted · Evidence added · Reframed · Surfaced.