SPRIND · Next Frontier AI Challenge

Submission Form

Recurrent Intelligence β€” SPRIND Next Frontier AI Challenge, working submission document.

Color coding β€” tag each field/answer:


1. Your Personal Information βšͺ

Field Value
Salutation (Ms. / Mr. / not specified) Mr.
Title (Dr. / Prof. / Prof. Dr.) not specified
Last name Halm
First name Tebjan
Email tebjan@gmail.com
Phone number [NEEDS: your phone]
Your position (e.g. project manager, CEO) Co-founder
Institution (dropdown) No Institution involved
Institution (other) β€”
Legal Form (None / Ltd. / GmbH / UG / GbR / KG / AG / Individual Enterprise / Other) None (European legal entity, UG minimum, to be formed in Stage 1)
Legal Form (other) β€”
Address line 1 [NEEDS: street + number]
Address line 2 β€”
Postal Code [NEEDS]
City [NEEDS: e.g. Berlin]
Country Germany
Country (other) β€”
Did you know about SPRIND before learning about the Next Frontier AI Initiative? (Yes/No) [NEEDS: Yes/No]
Where did you first learn that SPRIND exists? (dropdown) [NEEDS]
Please specify where you heard about us. [NEEDS]
What made you decide to actually apply to this Challenge? (dropdown) [NEEDS]
Please specify why you decided to actually apply to this Challenge. [NEEDS]

2. Your Solution

Project Title (text, 50 chars) πŸ”΅

Recurrent Intelligence

Short Description (textarea, 500 chars) πŸ”΅

A model that never stops reading: it predicts the next token over an in-principle unbounded context at constant cost per step, on a device, not a datacenter. We build recurrent networks (RNNs) that hold their whole history in a fixed state, so compute and memory per step stay constant. State-space models are the strongest substrate today; we work across the RNN design space and keep what performs. They scale from tiny sensor streams to frontier LLMs, opening systems as a new modality.

Frontier Dimension (textarea, 500 chars) πŸ”΅

Recurrent networks as the successor to attention-based transformers. Today's models add compute and context, hitting energy and memory walls, hardest under European constraints. A recurrent network carries its history in a fixed state, so it scales to effectively unbounded context and runs on the edge, where transformers cannot. They already win on streaming and long-context tasks. We bet the next S-curve is recurrent; the question is which architecture reaches frontier scale first.

Core Idea & Architecture (textarea, 3000 chars) πŸ”΅

Biological intelligence is recurrent, and the world is continuous: living systems make sense of a stream of events through an evolving internal state, not a fixed block of context. We bring that principle to silicon. The payoff is models that keep predicting over endless streams at a per-step cost that never grows, on the edge rather than in a datacenter.

We develop and optimize recurrent architectures into models that run in the real world and compete in demanding environments. State-space models are the strongest substrate today. The work is to take SOTA recurrent networks and engineer them with the right methods and proper software around them, into fast, efficient, and deployable architectures on standard GPUs, not only on neuromorphic chips.

Several methods drive this. One, our novel invention, is state-conditioned structured sparsity: at each timestep, a stateful router selects a small subset of the model's compute blocks as a function of the accumulated recurrent state and current input, rather than running the full dense model. Another is activation-centric GPU kernels that execute only the active work and map it onto dense matmuls and Tensor Cores, where fine-grained sparsity normally gives no speedup. Another is static structured sparsity, for example structured pruning carried into pretraining, a lever we are developing toward large training speedups, up to roughly 10x as the target. The block pool adds multi-timescale cells and an optional recursive-reasoning module that adds depth without parameters, with attention or MLP used where they help.

A worked example: watching a dynamic system to predict failures. A power grid, a patient, or a production line emits many sensor streams at once. The model folds them into its running state step by step, learns the system's normal dynamics, and detects the hidden patterns that precede a fault, raising a pre-warning before a catastrophic failure is obvious so an operator or clinician can act. The same operator can query the live state at any moment. This shape carries across power grids, health monitoring, or manufacturing surveillance.

The model and router are trained together so the predicted activations stay accurate and stable. We have already built and measured the hard parts: bit-exact GPU kernels that run this sparse compute about 3.4 times faster than torch.compile() dense, and a structured-sparse recurrent model that matches dense accuracy. Benchmarks are in Existing Artifacts. We did all of this in a few days at the Merantix AI Campus in Berlin, leveraging AI research assistance, and we are excited to see what else we can find.

The architecture is not fixed in advance. We run controlled ablations over candidate blocks and their combinations against dense and SSM baselines, profile where the real cost is, and keep the constellations that win; the best mix may be task-specific. Where labelled data for a target system is scarce, we build targeted datasets with design partners.

Technical Novelty (textarea, 2000 chars) πŸ”΅

Our novelty is the set of new mechanisms we develop to solve the distinct limitations of RNNs, composed with proven tool-classes. One is state-conditioned routing with content-addressed memory, where the recurrent state selects the active compute and recalls past content, aimed at the memory limitation. The other is activation-centric algorithms that leverage the network's sparsity, executing only the active work so sparse recurrent compute runs fast on standard GPUs. Further mechanisms target training cost, for example structured pruning carried into pretraining. These are examples; we keep researching where existing architectures fall short.

The combination draws on state-space and recurrent dynamics (Mamba, Mamba-2, xLSTM, StateX), expressive units (ELM), recursive or hierarchical reasoning (HRM, TRM, GRAM, Titans), conditional computation and routing (Capsules, Sparsely-Gated MoE, RIMs, MoE-Mamba, BlackMamba, Routing-Mamba, Swimba), hybrid recurrent-attention precedent (Jamba, Nemotron-H), and neuromorphic roots (Loihi 2, SpiNNaker2, SpikingBrain).

What current recurrent models miss: hybrids such as Nemotron-H and Jamba lean on attention, the routed ones (MoE-Mamba, BlackMamba, Swimba) route stateless and token-wise, and pure SSMs show a recall gap. This is achievable: architecture already beats scale, with small recursive models (GRAM, TRM) beating far larger ones at reasoning, and Nemotron-H beating a 72B transformer.

Measurable advantage today: bit-exact GPU kernels run 3.4x faster than torch.compile() dense at identical accuracy, and structured sparsity matches dense accuracy on a recurrent benchmark. We target order-of-magnitude lower energy per token and large training speedups (estimates). The capability gap is efficient long-horizon inference with reliable memory at bounded, predictable compute, which transformers cannot reach at any scale.

Technical Novelty Citation (textarea, 1000 chars) πŸ”΅

Mamba 2023 https://arxiv.org/abs/2312.00752 Mamba-2 2024 https://arxiv.org/abs/2405.21060 xLSTM 2024 https://arxiv.org/abs/2405.04517 StateX 2025 https://arxiv.org/abs/2509.22630 Capsules 2017 https://arxiv.org/abs/1710.09829 Sparsely-Gated MoE 2017 https://arxiv.org/abs/1701.06538 RIMs 2019 https://arxiv.org/abs/1909.10893 MoE-Mamba 2024 https://arxiv.org/abs/2401.04081 BlackMamba 2024 https://arxiv.org/abs/2402.01771 Routing-Mamba 2025 https://arxiv.org/abs/2506.18145 Swimba 2026 https://arxiv.org/abs/2603.06938 ELM 2023 https://arxiv.org/abs/2306.16922 HRM 2025 https://arxiv.org/abs/2506.21734 TRM 2025 https://arxiv.org/abs/2510.04871 GRAM 2026 https://arxiv.org/abs/2605.19376 Titans 2025 https://arxiv.org/abs/2501.00663 Jamba 2024 https://arxiv.org/abs/2403.19887 Nemotron-H 2025 https://arxiv.org/abs/2504.03624 Loihi 2 2021 https://arxiv.org/abs/2111.03746 SpiNNaker2 2021 https://arxiv.org/abs/2103.08392 SpikingBrain 2025 https://arxiv.org/abs/2509.05276

Capability Gap Addressed (textarea, 1000 chars) πŸ”΅

Always-on streaming at constant per-step compute and memory, where attention and its growing cache cannot follow: live patient monitoring, power-grid and factory telemetry, network and sensor streams. New methods leverage structured and dynamic sparsity on the GPU. Reasoning at small parameter counts, where recursive depth substitutes for size, which suits on-device and regulated settings where data must stay local, such as clinics, banks and law firms. Systems as a new modality: continuous understanding and prediction of a dynamic system, a regime with no fixed start, end, or bounded context. The same architecture can also serve frontier language models and infinite-context assistants.

Existing Artifacts (textarea, 2000 chars) πŸ”΅

These artifacts are the validated tools and R&D machinery we use to build the target system, not the product itself, which is a novel frontier-grade recurrent model showcased by systems as a modality. The items below are proven components, evidence and tooling.

Live dashboard: routed-SSM, kernels, ablations. Code and versioned benchmark JSONs are in the project repository (available to the jury on request); a technical report or preprint is in preparation for Stage-1 M1.

Technology Readiness Level (TRL) Assessment (textarea, 1000 chars) πŸ”΅

Overall system TRL 2–3. The concept is formulated and backed by component-level precedents, but the integrated architecture is experimental and needs validation under controlled benchmarks. Sub-components:

Open Research Risks (textarea, 1000 chars) πŸ”΅

The central question is whether we can solve the current limitations of recurrent networks, above all their memory. They compress history into a fixed state and lose precise recall. Starting from architectures like Nemotron-H and Mamba, can we add reliable, content-addressable memory without a growing context window? That is the question that matters most.

Tied to it: can a routed or sparse model match a dense one? Today it does not. A routed spiking net collapsed on SHD (38.6 to 51.8% versus 85.1% dense); we test fixes: routing on the continuous state, richer cells (ELM, TC-LIF), and key/value separation.

Further risks: sparsity may not become a real GPU speedup if routing is irregular; the training speedup from pruning is a target, not proven; small-scale wins may not survive scale-up toward 13B; and labelled data for real systems is scarce.

We test each against dense and stateless baselines, with scaling curves and design-partner data, keeping what survives.

Compute Requirements (textarea, 1000 chars) πŸ”΅πŸŸ’

Stage 1 funds validation, ablation and larger demonstrators. We budget about 150,000 to 200,000 H100-hours, roughly 50 to 65 H100 GPUs over the seven-month phase, about 0.4 to 0.5M EUR at competitive cloud or European pricing (an estimate), with a re-run reserve (factor 2 to 3) for failed runs and sweeps. That is about 15 percent of the budget; personnel and engineering are the larger investment.

We establish controlled scaling curves across four to five model families up to about 13B parameters, benchmarked against transformer, SSM, Mamba and hybrid baselines, to prove architectural leverage. Full frontier scale needs an order more compute, secured in Stages 2 and 3 via long-term procurement and SPRIND bulk allocation.

Hardware is H100-class via cloud or European providers, plus a one-time local workstation GPU spend of about 90k EUR for fast per-researcher ablation.

KPIs, Benchmarks and Potential Impact (textarea, 1000 chars) πŸ”΅πŸŸ’

KPIs follow efficiency, performance and long-horizon operation: accuracy parity/improvement of routed-sparse vs dense at matched FLOPs; decode latency and throughput (tokens/s) vs compiled dense transformer and SSM baselines; energy per token; max stable stream length at bounded memory; realised sparsity and active-block count; and compute vs sequence length (sub-quadratic test).

Benchmarks include long-context retrieval, associative recall, cue-switching, streaming classification and real-time prediction, final suite chosen in Stage 1.

The lead use case and highest impact is systems as a modality, the continuous understanding and prediction of dynamic systems (power-grid, factory, medical), as always-on, in-house performance on small servers, plus infinite-context assistants and frontier-scale language, code and reasoning under European energy and compute constraints. Impact is assessed by matching benchmarked strengths to use cases, then validating with a design partner.

Work Plan (textarea, 4000 chars) 🟒

The project covers research and commercialisation across all three stages.

Stage 1 (7 months, €3M), validate the training and kernel approach. Proof with a first model, and potential use cases plus a partner for co-development identified. Stage 2 (8 months, €8M), scale to larger recurrent models. Integration tests run inside co-development projects. Stage 3 (9 months, €15.5M), application and use case. Real-time AI on the edge (systems and sensor streams), benchmarks against Transformer SOTA and neuromorphic hardware, and a demo of the use case with a partner.

Approach (research method, all stages). Three intertwined modes. First, build on proven tool-classes such as state-space models, routing, expressive neurons, recursive reasoning and sparsity. Second, do original research to develop new mechanisms, such as state and path-dependent computation, new recurrent cells, routing schemes and efficient sparse compute. A specific bet is to attack the memory and recall problem with feedback cycles (iterative, top-down recurrence) and inference-time selective weight adjustments (test-time weight updates and fast weights). Third, run rigorous empirical science: implement, benchmark on the triad of efficiency, performance and unbounded context against a compiled baseline, profile the real bottleneck, try to falsify, and keep only what survives. This is accelerated by custom AI research tooling for automated experimentation, benchmarking and rapid small-model prototyping.

Implementation and commercialisation. The lead use case is systems as a modality, the continuous understanding, prediction and observation of dynamic systems (power-grid, factory, medical). Each validated approach is benchmarked, matched to use-case requirements, and taken to co-development partners to build one real, validated use case rather than a generic LLM replacement.

Hardware path. GPU-efficient models are the Stage-1 priority. Neuromorphic-chip partnerships are a longer-term lever, and chip-design itself is out of Stage-1 scope.

Team and resourcing. Stage 1 runs on a small, focused team of 5 FTE, the core team full-time plus experts who support specific technical topics, business operations and go-to-market as part-time or temporary hires, subcontractors or advisors. For Stage 2 we grow the team and extend technical, operational and go-to-market skills, which scales R&D with parallel commercialisation.

Business operations from day one. We form a European legal entity (UG minimum) early in Stage 1, budget at least half an FTE for business operations (finance, controlling, HR), use external providers for HR, accounting and IP-law, and include an IP-protection budget and a post-grant follow-up and financing plan. (Operations and Economic-Viability prose: Jana.)

Collaboration and subcontracting. Research advisory (Uni LΓΌbeck, S. Otte), co-development partners for use-case integration in Stages 2 and 3, compute procurement, and legal and IP. Specific subcontract work packages are defined in the Stage-2 roadmap.

Stage 1 milestones.

Financial Cost Estimate for Stage 1 (numeric, max 3,000,000 EUR) 🟒

Submitted figure: 1,345,000 EUR (an estimate, within the 3,000,000 cap; non-dilutive PCP). Direct-cost breakdown below; full version in the cost-overview PDF.

Category Stage-1 (EUR)
1. Direct personnel & experts (~5 FTE, fully loaded, incl. advisors) 470,000
2. Direct project operations & legal (UG, IP, recruiting, SaaS, travel, office, team) 140,000
3. Compute & hardware (cloud GPU ~€450k + local rigs €90k) 540,000
4. Indirect overhead (lean) 45,000
5. Contingency & escalation (~15%) 150,000
Total 1,345,000

Team (textarea, 2000 chars) 🟒

Why best suited. The two founders already pursue this thesis, sparsity and recurrence for efficient scale, in published research and shipped production systems. They have a proven research track record and high momentum: in only a few days, many results were proven and rediscovered casually on a laptop GPU. Together, they cover research, GPU and kernel engineering, and real-time systems. Networks include MIT, Mercedes-Benz, Merantix AI Campus, Uni LΓΌbeck, Bitkom and Antler.

Founders / core team:

Business and Strategic Advisor: Jana Lehner. Physics PhD; Director of IP and former CBO at a quantum deep-tech scale-up; 19 years at IBM Quantum; Bitkom board; managed two €10M grants; covers IP, commercial and partnerships. Research Advisor: Sebastian Otte. Professor at Uni LΓΌbeck, Geoffrey's PhD supervisor and second examiner of Johann's bachelor thesis; researches state-based recurrent models. Researcher (in talks): Mirko Klukas. Math PhD; sparse-coding and sequence-memory research at MIT and Numenta. In talks and agreed to join as a hire; not 100% confirmed before the deadline. Additional researcher (possible): Johann Machemer. DNN pruning research (Calprune); 2 peer-reviewed papers and a FLAIRS Best Student Paper.

What is missing and how we close it. The core is deliberately lean; we add targeted senior research and engineering hires as funding grows and outsource HR, accounting and IP-law.

# Name, role, % FTE (250 chars) Track-record links (250 chars)
Team Member 1 Tebjan Halm, Co-founder (research, GPU/kernels), ~100% FTE https://tebjan.de , https://github.com/tebjan
Team Member 2 Geoffrey Kasenbacher, Co-founder (principal researcher), ~100% FTE https://scholar.google.com/citations?user=xmYxqsIAAAAJ&hl=en , https://github.com/kgeoffrey
Team Member 3 Mirko Klukas, senior ML researcher (in talks; MIT / Numenta), ~6 months https://blog.mirkoklukas.com
Team Member 4 Johann Machemer, Researcher (pruning / sparsity; possible), ~6 months https://github.com/johannmachemer
Team Member 5 ML / data engineer (to hire), ~6 months β€”
Team Member 6 Business operations (to hire), ~0.75 FTE β€”
Team Member 7 Jana Lehner, Business & Strategic Advisor (Honorarbasis), part-time [NEEDS: LinkedIn / Google Scholar]
Team Member 8 Sebastian Otte, Research Advisor (Honorarbasis), part-time [NEEDS: Uni LΓΌbeck profile]

3. Attachments (PDF only) βšͺ


4. Legal & Declarations βšͺ


5. Spam Protection βšͺ

CAPTCHA: "What letter is the second last letter of the alphabet?" answer Y


6. Submit

β†’ Send SPRIND Challenge submission


Changelog

2026-06-01 β€” Compute/budget reconciled

2026-06-01 β€” Clickable citations, de-duplicated fields, story spine, framing fix

2026-06-01 β€” Continuous-AI bet, momentum proofs, memory research bet

2026-06-01 β€” Efficiency-first reframe, plain prose, compute scaled up

2026-06-01 β€” Reframe groundwork + Google-Doc sync + artifact reframe

2026-05-31 22:30 CEST β€” Prioritise SSM/Mamba artifacts

2026-05-31 22:05 CEST β€” Integrate Jana's field texts

2026-05-31 21:50 CEST β€” Submission form goes live as main entry point

2026-05-31 β€” Form drafted