EvoLLM

Adapter selection

Auto = bandit picks based on learned preference. Or force one.

Retrieve from uploaded documents and cite sources

🔍 Knowledge mode

Chatbot

Try: Explain quantum entanglement. · Write a haiku about adaptive AI. · What is distillation in machine learning? · Translate to French: 'Good morning, how are you?'

🧬 Active Genome

Code

The population of evolved variants

Each adapter is a distinct genome — system prompt, sampling config, LoRA setup. The bandit learns which one wins for your usage.

Name	Gen	Eval Bank	Bandit Mean	Trials	Status
Empathetic	1	0.68	0.629	5	✅

Name	Gen	Eval Bank	Bandit Mean	Trials	Status
Concise	1	0.68	0.629	5	✅
Creative	1	0.55	0.536	5	✅
Empathetic	1	0.59	0.564	5	✅
Technical	1	0.71	0.65	5	✅
Default	0	0.62	0.586	5	✅

📈 Evolution — watch fitness climb across generations

No measured run loaded. Run python scripts/run_evolution_sweep.py and commit space/data/evolution_run.json to populate this with real scores.

📥 Import a trained adapter

Drop the two files produced by the Colab training notebook (*.gguf and *.json) to add a user-trained adapter to the pool.

LoRA adapter (.gguf)

Manifest (.json)

Document knowledge — the second dimension of evolution

Upload PDFs, Word docs, Markdown, or paste text. EvoLLM chunks, embeds, and indexes them locally with a multilingual embedder. In the Chat tab, toggle 🔍 Knowledge mode and the model retrieves relevant chunks before answering, citing sources.

⚠️ On HF Space uploads are session-only — they're processed inside the Space container and disappear on rebuild. Use the local desktop app for true privacy and persistence (data/knowledge.sqlite).

Upload files

Drop PDF / TXT / MD / DOCX (multi-select OK)

Or paste text directly

Source name

Text content

Indexed documents

No documents indexed yet.

Name	Format	Size	Chunks	Uploaded
—	—	—	—	—

🧬 Train an adapter from these documents

Bake the document content into a real LoRA adapter via QLoRA on Colab. EvoLLM generates a configured notebook with your data inline; you run it on a free T4 GPU; then import the resulting .gguf + manifest back here.

Documents to train on

Select one or more indexed documents.

Adapter name

LoRA rank

Higher = more capacity, slower training

4 64

Training epochs

1 10

📒 Download notebook

Lineage of mutations, promotions, and feedback events

05:24:59 📊 fitness — Eval bank baseline pending — run scripts/run_evolution_sweep.py to populate real scores.

05:24:59 🧬 init — EvoLLM initialised — 5 seed adapters loaded into pool.

Recent feedback

No feedback yet — rate a response with 👍 or 👎.

EvoLLM — what's actually here

Hardware-adaptive architecture

EvoLLM scales the base model to the user's hardware while keeping the evolution engine identical across all tiers:

Tier	Base	Use	Speed
Phone / IoT	SmolLM2-135M	embedded edge	~50 tok/s on phone NPU
Web demo (this Space)	SmolLM2-360M	free public preview	~5 tok/s on 2 vCPUs
Local desktop app	SmolLM2-1.7B	privacy-first daily driver	~30 tok/s on a 4090
Workstation	Qwen 2.5 7B	power user	~100 tok/s on A100
Datacenter	Llama 3.1 8B+	hosted serving	~300 tok/s on A100

The genome schema, adapter pool, Thompson bandit, eval bank, and mutation operators are byte-for-byte the same across every tier. Only the base weights change. That's the deployment story.

The evolution layer

EvoLLM wraps each base model with:

Base swap: every tier runs a different base — the smallest variant is 135M for embedded, the largest is 8B+ for datacenter
Adapter pool: 5 hand-curated genome variants, with the architecture in place to ingest real distilled LoRA weights (Phase 2 — Colab notebook in repo)
Bandit: Thompson sampling over Beta(α, β) reward distributions per adapter. Live thumbs feedback updates posteriors in real time.
Eval bank: 40 fixed prompts across reasoning, factual, code, writing, instruction-following, safety, calibration, and edge cases. Deterministic rule-based scoring — no LLM-as-judge dependency.
Mutation operators: LoRA rank, target modules, memory token, sampling config, system prompt
Fitness: 50/50 blend of eval-bank score and live feedback win-rate

Why this matters

Other local LLMs (Ollama, LM Studio, GPT4All) ship one frozen model. EvoLLM ships a population — and that population evolves on the user's machine, in response to that specific user. The same hardware runs a better model after a week of use than it did on day 1.

Two dimensions of evolution

EvoLLM evolves on two orthogonal axes:

Behaviour — the adapter pool. Each adapter is a genome (system prompt, sampling config, LoRA setup). The Thompson-sampling bandit learns which adapter wins for the user from live thumbs feedback.
Knowledge — uploaded documents. Embedded with a multilingual model and stored in a local vector DB. When Knowledge mode is on, queries retrieve the top-3 relevant chunks and inject them as grounded context with citations.

Both dimensions feed the same evolution log. Both live on the user's hardware. Both are visible in the UI.

Roadmap

Phase	Status	What
0 — Inference foundation	✅ Done	FastAPI + llama.cpp + GGUF
1 — Adapter loading + memory token	✅ Done	The 5-personality adapter pool
2 — Distillation seed adapters	🚧	Colab notebook produces real LoRA files
3 — Desktop installer	🗓	Tauri/Electron bundle for Windows
4a — Knowledge layer (RAG)	✅ Done	This tab — multilingual embed + cite
4b — LoRA-on-upload	🚧	"Train adapter from documents" Colab flow
5 — Background evolution worker	🗓	Periodic QLoRA retrain on feedback
6 — Cloud-mediated adapter delivery	🗓	Opt-in anonymized feedback → updates

Source

GitHub: drhemanm/EvoTransformerV11 Built on EvoTransformer (Mohabeer, 2025).

🧬 EvoLLM — Self-Evolving Local LLM