Color coding β tag each field/answer:
- π΅ TECH β technical content
- π’ COMMERCIAL β commercial / business / org content
- βͺ ADMIN β administrative / legal / personal
1. Your Personal Information βͺ
| Field | Value |
|---|---|
| Salutation (Ms. / Mr. / not specified) | |
| Title (Dr. / Prof. / Prof. Dr.) | |
| Last name | |
| First name | |
| Phone number | |
| Your position (e.g. project manager, CEO) | |
| Institution (dropdown) | |
| Institution (other) | |
| Legal Form (None / Ltd. / GmbH / UG / GbR / KG / AG / Individual Enterprise / Other) | |
| Legal Form (other) | |
| Address line 1 | |
| Address line 2 | |
| Postal Code | |
| City | |
| Country | |
| Country (other) | |
| Did you know about SPRIND before learning about the Next Frontier AI Initiative? (Yes/No) | |
| Where did you first learn that SPRIND exists? (dropdown) | |
| Please specify where you heard about us. | |
| What made you decide to actually apply to this Challenge? (dropdown) | |
| Please specify why you decided to actually apply to this Challenge. |
2. Your Solution
Project Title (text, 50 chars) π΅
Recurrent Intelligence
Short Description (textarea, 500 chars) π΅
Recurrent networks are the efficient path to frontier AI. We make recurrent models frontier-grade, with constant per-step compute, no growing context window and memory in the state, at order-of-magnitude better efficiency. We already show structured sparsity beating dense on RNNs, and predict active neuron blocks to run up to 26x faster on GPUs. We target frontier-scale LLMs (Nemotron-class) and systems as a modality, always-on prediction of dynamic systems, as in-house SOTA on small servers.
Frontier Dimension (textarea, 500 chars) π΅
Our bet is that the world is continuous, so it needs continuous AI, and that no single method wins, the answer is a combination of proven components and new mechanisms. We want to make recurrent networks frontier-grade and efficient, with history in a fixed state, not a growing context window, for bounded memory, predictable per-step compute and unbounded streaming. The capability this unlocks is systems as a modality, the always-on understanding and prediction of dynamic systems.
Core Idea & Architecture (textarea, 3000 chars) π΅
The system is a recurrent, state-based architecture for efficient long-horizon inference under always-on, low-SWaP constraints. Its showcase capability is systems as a modality, the continuous understanding and prediction of a dynamic system from its multi-sensor stream. The central substrate is a fixed-size latent state updated at each timestep. Unlike transformer inference, history is not represented by an expanding context window or repeated attention over stored tokens. Information is compressed into the evolving state, which enables streaming-native execution with bounded per-step memory and compute, independent of the processed sequence length.
The data flow runs from input stream, to recurrent state update, to a state-conditioned router, to the selected computation blocks, to the updated state and a task-specific output. At each timestep the model receives the current input and the previous latent state. A selective recurrent core, based on SSM/Mamba, S4/S5, HiPPO, spiking or xLSTM-style dynamics, updates the persistent state. A router then selects a bounded, hardware-compatible subset of computation blocks as a function of both current content and accumulated state. This is state-conditioned structured sparsity: only the active sub-network runs, and the selected path can change within a sequence as the inferred context changes.
The computation blocks can include expressive recurrent neurons, multi-timescale cells, recursive or hierarchical reasoning modules, structured sparse operators, and, where empirically justified, optimised attention or MLP blocks. These modules are not a fixed catalogue. Each is kept only if controlled ablations show measurable gains in latency, accuracy, stability, long-horizon retention or energy efficiency.
The key interaction is that recurrence does not only store information, it also controls computation. The evolving state is both compressed memory and the control variable for routing, so prior trajectory affects which blocks activate later. This yields path-dependent computation and a content-addressed memory interface.
The output layer maps the updated state and the selected block activations to task-specific predictions, such as detection, classification, correction or control signals. Because labelled system data is scarce, datasets are produced to defined requirements, using small targeted sets and design partners, rather than assumed to exist. The architecture targets dynamic edge and mobile deployment, where real-time inference, bounded memory, predictable compute and energy efficiency are required. Core claims are evaluated by benchmarking against dense transformers, SSMs, Mamba, MoE-Mamba, BlackMamba-style sparse recurrent models and hybrid baselines. Falsification tests target long-context retention, routing utility, realised sparsity, compute scaling, latency, energy per token and failure-prevention performance.
Technical Novelty (textarea, 2000 chars) π΅
The novelty is a frontier-grade recurrent network. We want to solve the long-standing problems of recurrent models, such as recall, stability and efficiency at scale, so a fixed-state recurrent model reaches SOTA quality at order-of-magnitude better efficiency than dense transformers, with constant per-step compute, bounded memory and unbounded streaming. The capability this unlocks is systems as a modality, the continuous understanding and prediction of a dynamic system from its endless multi-sensor stream, which no transformer can serve structurally.
We aim to reach this by composing proven components with new mechanisms, not by any single trick, and we use attention or MLP blocks where they measurably help rather than avoiding them. The substrate is selective state-space and recurrent dynamics (SSM, S4/S5, Mamba, Mamba-2, xLSTM, StateX), which carry history in a fixed evolving state. Onto this we add expressive and multi-timescale units (ELM-style), recursive or hierarchical reasoning (GRAM, HRM, TRM, Titans), and conditional computation and routing (MoE, Switch, capsule routing, MoE-Mamba, BlackMamba, Routing-Mamba).
One new mechanism with early evidence is state-conditioned, path-dependent routing, where the active sub-network is selected from the accumulated recurrent state, so the computation path can change mid-sequence and the state doubles as a content-addressed memory. Cue-switch: 99% stateful versus below 70% stateless. It is one tool in the kit, not the thesis.
The capability gap is efficient long-horizon inference at bounded memory and predictable compute. Transformers keep growing context windows, KV-caches and repeated attention; pure SSMs stream efficiently but compress history and show recall gaps. By combining recurrent state, content-addressed memory and accelerator-friendly sparse activation, we target lower latency, lower energy per token and sub-quadratic compute versus length, for always-on, in-house performance on small servers.
Technical Novelty Citation (textarea, 1000 chars) π΅
Representative prior art. Selective and state-space recurrence: Mamba, Mamba-2, xLSTM, StateX [1β4]. Routing and conditional computation: Capsules, Sparsely-Gated MoE, RIMs, MoE-Mamba, BlackMamba, Routing-Mamba, Swimba [5β11]. Expressive recurrent units: ELM [12]. Recursive and memory-augmented reasoning: HRM, TRM, GRAM, Titans [13β16]. Hybrid recurrent-attention systems: Jamba and Nemotron-H [17β18]. Neuromorphic and event-driven systems: Loihi 2, SpiNNaker2, SpikingBrain [19β21]. Our delta is state-conditioned routing, where the accumulated recurrent state, rather than a token-local embedding alone, selects the active computation path and the memory address.
[1] Mamba; [2] Mamba-2; [3] xLSTM; [4] StateX. [5] Capsules; [6] Sparsely-Gated MoE; [7] RIMs; [8] MoE-Mamba; [9] BlackMamba; [10] Routing-Mamba; [11] Swimba. [12] ELM; [13] HRM; [14] TRM; [15] GRAM; [16] Titans. [17] Jamba; [18] Nemotron-H; [19] Loihi 2; [20] SpiNNaker2; [21] SpikingBrain.
Capability Gap Addressed (textarea, 1000 chars) π΅
The new functionality comes from efficiency, not from copying transformers. A fixed-state recurrent model gives always-on streaming at bounded per-step compute and memory, which is far leaner than an attention or KV-cache cost that grows with sequence length [16]. SSM and Mamba models stream cheaply but compress history and show recall gaps [1,4], which we want to fix. This unlocks systems as a modality, the continuous understanding and prediction of dynamic systems, as in-house performance on small servers for security and critical infrastructure, which transformers cannot serve structurally. Path-dependent computation lets the active sub-network switch mid-sequence as the inferred state changes (cue-switch: 99% stateful versus below 70% stateless). Recursive models also show that reasoning can arise from architectural depth at small parameter counts [13β15].
Existing Artifacts (textarea, 2000 chars) π΅
These artifacts are the validated tools and R&D machinery we use to build the target system, not the product. The product is a novel frontier-grade recurrent model, showcased by systems as a modality. The items below are proven components, evidence and tooling we compose with new mechanisms (stateful routing among them).
- The core mechanism already works on an SSM substrate. On a continuous diagonal-SSM (Mamba-style) substrate, a stateful router that activates 1 of 8 blocks (12.5% active) reaches 96.7β99.9% on a cue-switch task, versus 47% stateless, 28% random and 99.3% dense. On harder MQAR recall, dense reaches 49% and routed 25β27%, which is the recall gap we want to close.
- We already have methods for dynamic and structured sparsity with real GPU gains. We established which sparsity is hardware-exploitable (block-structured, not fine or random). By predicting which blocks of neurons are active, the model runs extremely fast: a spiking prototype reaches about 26x inference speedup via a bit-exact look-up table, and spike-pool kernels run 1.73β3.43x versus compiled dense, parity-tested (max_abs = 0). These kernels go directly into the recurrent model where the gains are needed.
- R&D velocity. We validated, rediscovered or falsified about 11 published results in roughly 19 days, about 4 of them falsifications of field over-claims, for example that router state is the critical axis (RMoE), that only block-structured sparsity is GPU-exploitable, and that structured sparsity can match or beat a dense baseline in accuracy on a recurrent (SNN) benchmark.
- Honest caveat. Routed versus compiled-dense SSM decode at batch 1 is 0.87β0.96x, so a continuous-SSM kernel speedup is still a research target.
Live dashboard: routed-SSM, kernels, ablations. Versioned JSONs in the repo. [GAP: GitHub repo and preprint links.]
Technology Readiness Level (TRL) Assessment (textarea, 1000 chars) π΅
Overall system TRL 2β3. The concept is formulated and backed by component-level precedents, but the integrated architecture is experimental and needs validation under controlled benchmarks. Sub-components:
- Recurrent substrate (SSM, Mamba, S4/S5, xLSTM): TRL 4β5. Demonstrated at lab and model scale, with partial evidence of scalable training and inference.
- Conditional routing and sparse MoE: TRL 4β5. Established, but integration with recurrent state-conditioned control is lower maturity.
- State-conditioned routing and content-addressed memory: TRL 2β3. A key research mechanism (one of several), supported by toy-task evidence, not yet validated at scale.
- Structured sparsity and accelerator execution: TRL 3β4. Implementable and hardware-relevant, but dependent on kernel design, batching and routing regularity.
- Hybrid attention and MLP components: TRL 5β6. Mature, reused where ablations show benefit.
Open Research Risks (textarea, 1000 chars) π΅
The research-grade risks centre on the routing mechanism, the memory interface and integrated-system scaling. Stateful routing may not beat stateless MoE or dense recurrent baselines beyond toy tasks. The routed state may be too compressed or unstable to act as a reliable content-addressed memory. Training may suffer routing collapse, gradient instability or state drift.
At systems level, sparse state-conditioned execution may not yield real latency or energy gains if routing irregularity causes poor GPU utilisation, and the sub-quadratic compute versus length hypothesis is empirical and may fail under realistic hardware.
On benchmarks and data, gains on cue-switch or synthetic long-context may not transfer to real streaming or critical-system workloads, and labelled system data is scarce. We mitigate with defined data requirements, small targeted datasets and design partners, and test everything with ablations, dense and stateless baselines, scaling curves and hardware profiling.
Compute Requirements (textarea, 1000 chars) π΅π’
Frontier-grade capability needs substantial compute, scaled across the three stages. Stage 1 funds validation, ablation and progressively larger demonstrators. We budget roughly 300,000 to 500,000 H100-hours, about 96 to 160 H100 GPUs over the seven-month phase, which is about β¬1.0 to 1.6M at competitive cloud or European pricing. This is a major share of the β¬3M envelope, alongside personnel and engineering, with a 2 to 3x reserve for failed runs and sweeps.
We avoid a single brute-force run. Instead we establish controlled scaling curves and several model families up to about 13B parameters, benchmarked against transformer, SSM, Mamba and hybrid baselines, to prove architectural leverage. Full frontier scale needs an order of magnitude more compute, secured in Stages 2 and 3 through long-term procurement and SPRIND bulk allocations.
Hardware is H100-class via cloud, European providers or a bulk SPRIND deal, plus a one-time local workstation GPU spend of about β¬30 to 100k.
KPIs, Benchmarks and Potential Impact (textarea, 1000 chars) π΅π’
KPIs follow efficiency, performance and long-horizon operation: accuracy parity/improvement of routed-sparse vs dense at matched FLOPs; decode latency and throughput (tokens/s) vs compiled dense transformer and SSM baselines; energy per token; max stable stream length at bounded memory; realised sparsity and active-block count; and compute vs sequence length (sub-quadratic test).
Benchmarks include long-context retrieval, associative recall, cue-switching, streaming classification and real-time prediction, final suite chosen in Stage 1.
The lead use case and highest impact is systems as a modality, the continuous understanding and prediction of dynamic systems (power-grid, factory, medical), as always-on, in-house performance on small servers, plus infinite-context assistants and frontier-scale language, code and reasoning under European energy and compute constraints. Impact is assessed by matching benchmarked strengths to use cases, then validating with a design partner.
Work Plan (textarea, 4000 chars) π’
The project covers research and commercialisation across all three stages.
Stage 1 (7 months, β¬3M), validate the training and kernel approach. Proof with a first model, and potential use cases plus a partner for co-development identified. Stage 2 (8 months, β¬8M), scale to larger recurrent models. Integration tests run inside co-development projects. Stage 3 (9 months, β¬15.5M), application and use case. Real-time AI on the edge (audio and sensor streams), benchmarks against Transformer SOTA and neuromorphic hardware, and a demo of the use case with a partner.
Approach (research method, all stages). Three intertwined modes. First, build on the field by adapting SSMs, conditional computation, routing and capsule networks, expressive neurons, recursive reasoning and sparsity. Second, do original research to develop new mechanisms, such as state and path-dependent computation, new recurrent cells, routing schemes and efficient sparse compute. A specific bet is to attack the memory and recall problem with feedback cycles (iterative, top-down recurrence) and inference-time selective weight adjustments (test-time weight updates and fast weights). Third, run rigorous empirical science: implement, benchmark on the triad of efficiency, performance and unbounded context against a compiled baseline, profile the real bottleneck, try to falsify, and keep only what survives. This is accelerated by custom AI research tooling for automated experimentation, benchmarking and rapid small-model prototyping.
Implementation and commercialisation. The lead use case is systems as a modality, the continuous understanding, prediction and observation of dynamic systems (power-grid, factory, medical). Each validated approach is benchmarked, matched to use-case requirements, and taken to co-development partners to build one real, validated use case rather than a generic LLM replacement.
Hardware path. GPU-efficient models are the Stage-1 priority. Neuromorphic-chip partnerships are a longer-term lever, and chip-design itself is out of Stage-1 scope.
Team and resourcing. Stage 1 runs on a small, focused team of 5 FTE, the core team full-time plus experts who support specific technical topics, business operations and go-to-market as part-time or temporary hires, subcontractors or advisors. For Stage 2 we grow the team and extend technical, operational and go-to-market skills, which scales R&D with parallel commercialisation.
Business operations from day one. We form a European legal entity (UG minimum) early in Stage 1, budget at least half an FTE for business operations (finance, controlling, HR), use external providers for HR, accounting and IP-law, and include an IP-protection budget and a post-grant follow-up and financing plan. (Operations and Economic-Viability prose: Jana.)
Collaboration and subcontracting. Research advisory (Uni LΓΌbeck, S. Otte), co-development partners for use-case integration in Stages 2 and 3, compute procurement, and legal and IP. Specific subcontract work packages are defined in the Stage-2 roadmap.
Stage 1 milestones.
- M1, end of month 5. Technical report or paper preprint published. Main scope: resolve the central risk, whether a routed or sparse recurrent net can match dense accuracy, by testing the fix hypotheses (route on the continuous membrane and adaptation state, richer block cells such as ELM, TC-LIF or PMSN, and key-value separation).
- M2, end of month 6. Artefacts produced (model families, experimental codebase, open-source contributions), and a potential scaling dimension or new emergent phenomenon identified, with scaling curves toward about 13B (no brute-force run).
- M3, end of month 7 (deliverable). Updated Stage-2 roadmap that compares the original hypothesis against new technological and operational insights, with an operational roadmap to scale R&D, a growth roadmap for partnerships and talent, and a detailed financial plan covering capital allocation, spending, control mechanisms and cash flow.
Financial Cost Estimate for Stage 1 (numeric, max 3,000,000 EUR) π’
Rough estimate (main cost drivers):
| Cost driver | EUR |
|---|---|
| Compute (HW infrastructure & operation) | ~1.0β1.6M |
| Personnel (5 FTE) | 500K + |
| Overhead (100%) | 500K + |
| Total | ~2.0β2.6M (within 3,000,000 cap) |
Team (textarea, 2000 chars) π’
Why best suited. The two founders already pursue this thesis, sparsity and recurrence for efficient scale, in published research and shipped production systems. Together they cover research, GPU and kernel engineering, and real-time systems, backed by business, strategic and research advisors covering IP, commercialisation and state-based-model science. They have a real entrepreneurial track record and fast execution. Networks include MIT, Numenta, Mercedes-Benz, Merantix AI Campus, Uni LΓΌbeck, Bitkom and Antler.
Founders / core team:
- Tebjan Halm. 20 years of real-time systems; built a compiler and runtime from scratch; optimised NVIDIA SANA to about 250 ms per image (2 to 3x faster than SANA-Sprint at about 6x less compute); state-centric, non-transformer architectures.
- Geoffrey Kasenbacher. Spiking and sparse nets and neuromorphic ML at Mercedes-Benz (about 1000x energy and 70x runtime in a shipped S-Class prototype); 21 granted patents; peer-reviewed (WARP-LCA).
Business and Strategic Advisor: Jana Lehner. Physics PhD; Director of IP and former CBO at a quantum deep-tech scale-up; 19 years at IBM; Bitkom board; managed two β¬10M grants; covers IP, commercial and partnerships. Research Advisor: Sebastian Otte. Professor at Uni LΓΌbeck and Geoffrey's PhD supervisor; works on state-based and recurrent models (Active Tuning, recurrent spiking control). Additional researcher (possible): Johann Machemer. DNN pruning research (Calprune); 2 peer-reviewed papers and a FLAIRS Best Student Paper; Uni LΓΌbeck.
What is missing and how we close it. The core is deliberately lean. We deepen hands-on coverage of specific methods and models with targeted senior research and engineering hires as funding grows, and we outsource HR, accounting and IP-law, drawing on the networks above.
| # | Name, role, % FTE (250 chars) | Track-record links (250 chars) |
|---|---|---|
| Team Member 1 | Tebjan Halm, Founder, [% FTE GAP] | tebjan.de, GitHub |
| Team Member 2 | Geoffrey Kasenbacher, Founder, [% FTE GAP] | GitHub |
| Team Member 3 | Jana Lehner, Business & Strategic Advisor, [% FTE GAP] | [GAP] |
3. Attachments (PDF only) βͺ
- [ ] CVs of key personnel. Jana sends hers tomorrow morning; collect Tebjan and Geoffrey, plus advisors.
- [ ] Detailed cost overview (PDF). Is a SPRIND template provided? If not, Jana can send one tomorrow morning.
- [ ] Declaration on RUS Sanctions form (obligatory). Must be signed by the legal representative.
4. Legal & Declarations βͺ
- [ ] Privacy Notice Agreement. I/we agree to the processing of my/our data in accordance with the SPRIND GmbH (Next Frontier AI Challenge) privacy notice, dated April 2026.
- [ ] Reasons for Exclusion, Infringement Declaration (Yes/No). Not sentenced to more than 3 months custodial, or more than 90 daily rates, or a fine of at least β¬2,500, with an existing non-redeemable Central Trade Register entry (for example Β§ 21 MiLoG or Β§ 21 AEntG). Agree to require this from subcontractors.
- [ ] Declaration on Reasons for Exclusion (Yes/No). No mandatory grounds under Β§ 123 GWB, no optional grounds under Β§ 124 GWB. If No, proof of self-cleaning per Β§ 125 GWB enclosed.
- [ ] Declaration of Infringements (Yes/No). If No, proof of self-cleaning per Β§ 125 GWB must be enclosed.
5. Spam Protection βͺ
CAPTCHA: "What letter is the second last letter of the alphabet?" answer Y
6. Submit
β Send SPRIND Challenge submission
Changelog
2026-06-01 β Continuous-AI bet, momentum proofs, memory research bet
- Added [frontier]: the bet that the world is continuous and needs continuous AI, served by a combination of methods rather than one.
- Sharpened [artifacts]: momentum now states two concrete proofs, structured sparsity matching or beating dense accuracy on a recurrent (SNN) benchmark, and predicting which blocks of neurons are active to run extremely fast.
- Added [work plan]: a research bet to solve the memory and recall problem via feedback cycles (iterative, top-down recurrence) and inference-time selective weight adjustments (test-time weight updates, fast weights).
2026-06-01 β Efficiency-first reframe, plain prose, compute scaled up
- Reframed [solution]: thesis is now "we want to solve the long-standing problems of recurrent networks" for frontier-grade, order-of-magnitude efficiency. Systems-as-a-modality dialed back to the showcase use case. Stateful routing kept as one mechanism among several. Attention framed as a tool used where it helps.
- Style [all Β§2]: removed em-dashes and AI-writing patterns; converted claims to goal framing.
- Scaled up [compute / financial]: Stage-1 compute raised to ~300β500k H100-hours (~β¬1.0β1.6M); financial table total now ~β¬2.0β2.6M, using most of the β¬3M envelope.
- Trimmed: all fields now within their character limits.
2026-06-01 β Reframe groundwork + Google-Doc sync + artifact reframe
- Synced [solution]: Β§2 prose to the team Google Doc; artifacts framed as tools/evidence for the novel recurrent model, not the product. Lead use case set to systems understanding / prediction / observation.
2026-05-31 22:30 CEST β Prioritise SSM/Mamba artifacts
- Reordered [artifacts]: rapid-rediscovery and the SSM/Mamba routing result (cue-switch 96.7β99.9% at 1-of-8 blocks); honest decode caveat.
2026-05-31 22:05 CEST β Integrate Jana's field texts
- Added [solution]: Jana's drafts (Short Description, Capability Gap, Work Plan structure, Financial figures).
2026-05-31 21:50 CEST β Submission form goes live as main entry point
- Added [site]: submission working-doc rendered to HTML as the site's main entry point; build-time markdown to HTML via markdown-it-py.
2026-05-31 β Form drafted
- Added [solution]: technical fields drafted from the Technical Synthesis page and artifact dashboard; founders and advisors set.