cactus-compute/needle

26m function call model that runs on incredibly small devices

Python2502 starsLLM & Prompt ToolingGitHub
Quality: solid 16/24
PAI: integrate 0.5

Overview

Verdict

Rating Summary
Quality solid (16/24) Excellent documentation and explosive early adoption undercut by no releases, no manifest, and no test infrastructure — a promising research artifact not yet a hardened library.
PAI Relevance integrate (0.50) Fills PAI's offline inference gap for tool routing; integrable via subprocess CLI with clean JSON output, though Python setup friction and mobile-first framing limit practical lift.

Quality Assessment

16/24 — maintained / well-documented / early-or-minimal

Health: 5/8 (maintained)

Failed:

Passed:

Documentation: 7/8 (well-documented)

Failed:

Passed:

Engineering Signals: 4/8 (early-or-minimal)

Failed:

Passed:

PAI Relevance

Dimension Score Assessment
Harvest Value 1 The encoder-decoder SAN architecture (no FFN in encoder, cross-attention routing, gated residuals, tied embeddings) is a novel approach to minimal function dispatch worth studying as a design reference for PAI's Delegation skill, which currently routes to heavyweight cloud agents with no lightweight local fallback. Not directly portable to TypeScript but the structural ideas are extractable.
Integration Readiness 1 Python-only, but the needle run --query "..." --tools '[...]' CLI emits clean structured JSON and is trivially subprocess-callable from a PAI skill. Requires Python environment bootstrap alongside Bun, which is moderate adapter work but not a rewrite.
Overlap Risk 1 PAI's 27-agent roster (Claude-family, Forge/GPT-5.4, Anvil/Kimi-K2.6) already handles tool-calling via the Agents and Delegation skills. Needle overlaps on function dispatch but is differentiated by offline/local execution with no API dependency — partial rather than full overlap.
Gap Fill 1 PAI has no on-device or offline inference capability; all agents are cloud-dependent. A locally-hosted sub-30M routing model that selects tools before hitting a heavier cloud agent could reduce latency and cloud token spend for high-frequency dispatch decisions — a functional area with limited current coverage.

Composite: 0.50

What Next

Landscape Position

Category: LLM & Prompt Tooling

In this category: mattpocock--evalite (decent, 15/24, skip)

Standing: First model-as-artifact entry in this category; evalite addresses evaluation tooling around existing LLMs while Needle is the LLM itself plus its finetuning/deployment toolchain — functionally non-overlapping within the category.

Evidence Base

Density: 8/10 — README (full 8KB, high signal), repo metadata (stars, forks, issues, dates, license, archived status, topics all present), landscape context and prior appraisals available. Missing: dependency manifest (explicitly absent), CI configuration, test infrastructure details, release history, and actual model benchmark numbers beyond the qualitative comparisons in the README.

Notes

The repository is three months old with no formal release yet commands 2502 stars — adoption is primarily driven by the Cactus ecosystem and the novelty of beating models 10× its size on function calling. The SAN architecture (encoder over tools + cross-attention into decoder) is the genuine research contribution; the finetuning playground is unusually polished for a research prototype at this age. The cited citation block in the README with eight authors signals this is a team effort with a publication trajectory, which improves the chance of continued maintenance. The lack of a dependency manifest and test suite are the most significant engineering red flags at this stage.