𧬠EvoLLM β Self-Evolving Local LLM
A privacy-first 1B-class language model that visibly improves itself through multi-armed-bandit adapter selection and Lamarckian evolution. Runs fully on-device. No telemetry. No API calls.
β‘ This web demo runs SmolLM2-360M for speed on the free CPU tier (~5 tok/s, answers in 20-40s). The local desktop app runs the full SmolLM2-1.7B for higher quality (5-30Γ faster on real hardware). The evolution engine, adapter pool, and bandit work identically on both.
Auto = bandit picks based on learned preference. Or force one.
Retrieve from uploaded documents and cite sources
Try: Explain quantum entanglement. Β· Write a haiku about adaptive AI. Β· What is distillation in machine learning? Β· Translate to French: 'Good morning, how are you?'
The population of evolved variants
Each adapter is a distinct genome β system prompt, sampling config, LoRA setup. The bandit learns which one wins for your usage.
Name | Gen | Eval Bank | Bandit Mean | Trials | Status |
|---|---|---|---|---|---|
Empathetic | 1 | 0.68 | 0.629 | 5 | β
|
π Evolution β watch fitness climb across generations
No measured run loaded. Run python scripts/run_evolution_sweep.py and commit space/data/evolution_run.json to populate this with real scores.
π₯ Import a trained adapter
Drop the two files produced by the Colab training notebook (*.gguf and *.json) to add a user-trained adapter to the pool.
Document knowledge β the second dimension of evolution
Upload PDFs, Word docs, Markdown, or paste text. EvoLLM chunks, embeds, and indexes them locally with a multilingual embedder. In the Chat tab, toggle π Knowledge mode and the model retrieves relevant chunks before answering, citing sources.
β οΈ On HF Space uploads are session-only β they're processed inside the Space container and disappear on rebuild. Use the local desktop app for true privacy and persistence (
data/knowledge.sqlite).
Upload files
Or paste text directly
Indexed documents
No documents indexed yet.
Name | Format | Size | Chunks | Uploaded |
|---|---|---|---|---|
β | β | β | β | β |
Name | Format | Size | Chunks | Uploaded |
|---|---|---|---|---|
β | β | β | β | β |
𧬠Train an adapter from these documents
Bake the document content into a real LoRA adapter via QLoRA on Colab. EvoLLM generates a configured notebook with your data inline; you run it on a free T4 GPU; then import the resulting .gguf + manifest back here.
Lineage of mutations, promotions, and feedback events
07:00:54 π fitness β Eval bank baseline pending β run scripts/run_evolution_sweep.py to populate real scores.
07:00:54 𧬠init β EvoLLM initialised β 5 seed adapters loaded into pool.
Recent feedback
No feedback yet β rate a response with π or π.
EvoLLM β what's actually here
Hardware-adaptive architecture
EvoLLM scales the base model to the user's hardware while keeping the evolution engine identical across all tiers:
| Tier | Base | Use | Speed |
|---|---|---|---|
| Phone / IoT | SmolLM2-135M | embedded edge | ~50 tok/s on phone NPU |
| Web demo (this Space) | SmolLM2-360M | free public preview | ~5 tok/s on 2 vCPUs |
| Local desktop app | SmolLM2-1.7B | privacy-first daily driver | ~30 tok/s on a 4090 |
| Workstation | Qwen 2.5 7B | power user | ~100 tok/s on A100 |
| Datacenter | Llama 3.1 8B+ | hosted serving | ~300 tok/s on A100 |
The genome schema, adapter pool, Thompson bandit, eval bank, and mutation operators are byte-for-byte the same across every tier. Only the base weights change. That's the deployment story.
The evolution layer
EvoLLM wraps each base model with:
- Base swap: every tier runs a different base β the smallest variant is 135M for embedded, the largest is 8B+ for datacenter
- Adapter pool: 5 hand-curated genome variants, with the architecture in place to ingest real distilled LoRA weights (Phase 2 β Colab notebook in repo)
- Bandit: Thompson sampling over Beta(Ξ±, Ξ²) reward distributions per adapter. Live thumbs feedback updates posteriors in real time.
- Eval bank: 40 fixed prompts across reasoning, factual, code, writing, instruction-following, safety, calibration, and edge cases. Deterministic rule-based scoring β no LLM-as-judge dependency.
- Mutation operators: LoRA rank, target modules, memory token, sampling config, system prompt
- Fitness: 50/50 blend of eval-bank score and live feedback win-rate
Why this matters
Other local LLMs (Ollama, LM Studio, GPT4All) ship one frozen model. EvoLLM ships a population β and that population evolves on the user's machine, in response to that specific user. The same hardware runs a better model after a week of use than it did on day 1.
Two dimensions of evolution
EvoLLM evolves on two orthogonal axes:
- Behaviour β the adapter pool. Each adapter is a genome (system prompt, sampling config, LoRA setup). The Thompson-sampling bandit learns which adapter wins for the user from live thumbs feedback.
- Knowledge β uploaded documents. Embedded with a multilingual model and stored in a local vector DB. When Knowledge mode is on, queries retrieve the top-3 relevant chunks and inject them as grounded context with citations.
Both dimensions feed the same evolution log. Both live on the user's hardware. Both are visible in the UI.
Roadmap
| Phase | Status | What |
|---|---|---|
| 0 β Inference foundation | β Done | FastAPI + llama.cpp + GGUF |
| 1 β Adapter loading + memory token | β Done | The 5-personality adapter pool |
| 2 β Distillation seed adapters | π§ | Colab notebook produces real LoRA files |
| 3 β Desktop installer | π | Tauri/Electron bundle for Windows |
| 4a β Knowledge layer (RAG) | β Done | This tab β multilingual embed + cite |
| 4b β LoRA-on-upload | π§ | "Train adapter from documents" Colab flow |
| 5 β Background evolution worker | π | Periodic QLoRA retrain on feedback |
| 6 β Cloud-mediated adapter delivery | π | Opt-in anonymized feedback β updates |
Source
GitHub: drhemanm/EvoTransformerV11 Built on EvoTransformer (Mohabeer, 2025).