Helps you tune LLM hyperparameters
No formal releases despite ~19 months of development. The README references extremely current model identifiers (GPT-5.2, Claude Opus 4.5, Gemini 3 Pro) indicating the codebase has been actively updated to track provider API churn, which is a positive signal for live-maintenance status. Zero open issues could reflect a clean, well-triaged project or simply low external usage generating few tickets. No tags or versioned releases make it hard to pin a stability baseline — call it late beta.
README is thorough and well-structured: feature list, per-provider model and parameter tables, step-by-step installation, a full CLI option reference table with examples, JSON export schema, visualization descriptions with embedded screenshots, and a clear "How It Works" section explaining the three-dimensional scoring model. Absence of a dependency manifest (requirements.txt not surfaced) and no docs site beyond the README keep it from a 5.
Dependency choices are appropriate: sentence-transformers (all-MiniLM-L6-v2) for semantic scoring, scikit-learn/nltk for NLP utilities, matplotlib/seaborn/pandas for visualization — no exotic or high-churn choices. A Makefile with setup and test targets confirms test infrastructure exists. No CI configuration is visible in the surfaced data, no dependency manifest to audit pinning, and no type annotations or linting configuration mentioned.
Last commit 2026-02-27, approximately 2.5 months before appraisal date — not stale but not rapid either. Model lists include very recent frontier models, confirming someone is tracking upstream API releases. Single-maintainer project with no org backing introduces bus-factor risk. No PR merge cadence data visible.
100 stars and 21 forks over ~19 months is modest for a utility touching four major LLM providers. No downstream dependents visible, no topics set (reduces discoverability), no release artifacts. Forks-to-stars ratio (~21%) is reasonable and suggests some users are actively experimenting with it.
Overall: 2.9/5
Category: LLM Hyperparameter Optimization Known alternatives in vault: None. No existing vault entry addresses LLM hyperparameter search or empirical prompt/response scoring. Differentiation: HyperTune's primary differentiator is its semantic scoring pipeline — using sentence embeddings to measure coherence, relevance, and complexity rather than simple heuristics or human review. Multi-provider abstraction (OpenAI, Anthropic, Google, OpenRouter) from a single CLI is also uncommon. General-purpose HPO libraries like Optuna or Ray Tune exist but are not LLM-aware and require significant glue code. The degenerate-output quality penalty is a practical touch absent from most comparable tools. Gap or crowd: Clear gap. No existing vault category covers this problem space; adding this repo introduces a new category with no crowding pressure.
Score: 3/5 Harvestable: The three-dimension semantic scorer (coherence 40% / relevance 40% / complexity 20% via all-MiniLM-L6-v2) is the highest-value extractable component — directly useful for evaluating any LLM output quality inside PAI skill pipelines. The degenerate-output detection logic (repetition/entropy checks) is a lightweight quality gate worth lifting. The multi-provider abstraction layer could serve as a reference pattern for a PAI LLM router. Integration path: Near-term: use as a CLI probe when calibrating hyperparameters for specific PAI task types (summarization, reasoning, creative). Medium-term: extract the scoring module as a standalone evaluator callable from PAI skill hooks. The JSON export format is already structured for downstream ingestion. Overlap with existing: No overlap with current vault entries. VoltAgent (AI Agent Engineering) and garrytan--gbrain (Personal AI Memory) both involve LLM calls but neither addresses hyperparameter optimization or output quality scoring. Adoption cost: Moderate. The CLI is usable immediately for manual tuning sessions. Embedding the scorer into PAI infrastructure requires refactoring the scoring classes out of the CLI entrypoint, adding an API or function interface, and managing the sentence-transformer model load at PAI boot time (non-trivial memory cost).
HyperTune is a focused, well-documented utility that solves a real problem: empirically finding optimal LLM hyperparameters for a given prompt type rather than relying on provider defaults or intuition. The semantic scoring pipeline is its most intellectually interesting component and the most directly harvestable for PAI use. The aggressive model-list currency (GPT-5.2, Claude 4.5, Gemini 3) is a double-edged signal — it shows active maintenance but also implies the codebase may require frequent patching as provider APIs evolve. Absence of formal releases and CI configuration are the primary quality gaps. At 100 stars with no topic tags, it is under-discovered relative to its utility. Worth adding to the vault as the sole entry in a new LLM Hyperparameter Optimization category; revisit if a competing tool with CI and versioned releases emerges.