Self-evolution agents and skills using open-source models and donation of AI compute between teams

Discover Istara, a local-first AI platform for UX research. Learn how agents self-evolve, share compute, and generate verifiable insights, enhancing team collaboration and research rigor.

FastAPI Next RAG LanceDB Ollama

Overview

Istara is a local-first, open-source multi-agent platform for UX research and design that runs entirely on the user’s own hardware. It combines five specialized agents, 53 research/design skills, and a governed self-evolution architecture: when agents detect capability gaps, they can propose new specialized agents or Memento-style skills with personas, protocols, routing logic, and evidence for human review. A Reasoning Bank stores successful and failed trajectories as reusable orchestration memory, while Meta-Hyperagent/DGM-H loops observe system performance and propose improvements to prompts, routing, skills, and parameters with lineage, rollback plans, and approval state. The next step already being implemented besides the current interface creation using Google Stitch and figma is A2UI, an agent-to-interface layer where Istara agents will move from research insights into structured agentic UI proposals, design alternatives, and implementation-ready screens while preserving traceability back to user evidence.

Under the hood, Istara uses FastAPI, Next.js, hybrid RAG with LanceDB vector search plus BM25, prompt compression, and a WebSocket compute relay for sharing idle local GPU/CPU capacity across a team. Its research outputs are grounded in an Atomic Research evidence chain from source quotes to nuggets, facts, insights, and recommendations, then validated through Mixture-of-Agents using different models or techniques to reduce bias using a single model, routing them through a consensus using Fleiss’ Kappa.

The goal is to make rigorous, verifiable, privacy-preserving UX research feel native to everyday research workflows, even on laptops and modest local models.

Links

https://github.com/henrique-simoes/Istara/
Istara is a local-first, multi-agent orchestration platform for UX research. It utilizes a FastAPI backend and Next.js frontend to deploy five specialized autonomous agents—Cleo, Sentinel, Pixel, Sage, and Echo—capable of executing 53 self-improving research skills. Technically, the system features a hybrid RAG engine combining LanceDB vector search with BM25 keyword retrieval, LLMLingua-inspired prompt compression, and a distributed compute swarm via WebSocket relays. It enforces methodological rigor through an "Atomic Research" evidence chain (Nugget-Fact-Insight-Recommendation) and validates findings using Mixture-of-Agents (MoA) consensus measured by Fleiss' Kappa. The project integrates with Figma, Google Stitch, and messaging channels like WhatsApp, while supporting interoperability through Model Context Protocol (MCP) and Agent-to-Agent (A2A) standards.

Tech stack

FastAPI

FastAPI is a modern, high-performance Python web framework for building APIs with automatic OpenAPI documentation.

FastAPI is a robust, high-speed Python web framework: it is built on Starlette (for async capabilities) and Pydantic (for data validation and serialization). Leveraging standard Python 3.8+ type hints, the framework automatically generates interactive API documentation (Swagger UI/ReDoc) and enforces data validation, effectively reducing developer-induced errors by an estimated 40%. This architecture delivers performance on par with Node.js and Go, significantly increasing feature development speed (up to 300% faster). It is production-ready, fully supporting OpenAPI and JSON Schema standards for all API specifications.

https://fastapi.tiangolo.com

View projects
Next

Next.js is the full-stack React framework: it delivers high-performance web applications via hybrid rendering and powerful, Rust-based tooling.

This is the React Framework for production: Next.js enables you to build full-stack web applications with zero configuration and maximum efficiency. It supports a hybrid rendering approach (Server-Side Rendering, Static Site Generation, and Incremental Static Regeneration) for optimal speed and SEO performance. Key features include React Server Components, Server Actions for running server code directly, and the App Router for advanced routing and nested layouts. Developed by Vercel, it leverages Rust-based tools like Turbopack and the Speedy Web Compiler for the fastest possible builds and a superior developer experience.

https://nextjs.org/

View projects
RAG

RAG (Retrieval-Augmented Generation) is the GenAI framework that grounds LLMs (like GPT-4) on external, verified data, drastically reducing model hallucinations and providing verifiable sources.

RAG is a critical GenAI architecture: it solves the LLM 'hallucination' problem by inserting a retrieval step before generation. A user query is vectorized, then used to query an external knowledge base (e.g., a Pinecone vector database) for relevant document chunks (typically 512-token segments). These retrieved facts augment the original prompt, providing the LLM (e.g., Gemini or Llama 3) the specific, current, or proprietary context required. This process ensures the final response is accurate and grounded in domain-specific data, avoiding the high cost and latency of full model retraining.

https://en.wikipedia.org/wiki/Retrieval-augmented_generation

View projects
LanceDB

LanceDB is the serverless, open-source vector database for multimodal AI: it powers fast, scalable RAG and semantic search applications.

LanceDB is your multimodal AI lakehouse, built on the high-performance Lance columnar format (Rust-based). This architecture provides a unified data store, natively handling vectors, metadata, and raw multimodal data (text, images, video) to eliminate separate databases. Leverage its disk-based indexes for low-latency vector search, full-text search, and SQL queries over petabyte-scale datasets. The platform delivers the speed and scalability required for production-ready RAG, autonomous agents, and large-scale model training workflows.

https://lancedb.com/

View projects
Ollama

Deploy and run open-source Large Language Models (LLMs) like Llama 3 and Mistral locally on your machine: achieve private, cost-effective AI via a simple command-line interface.

Ollama is the essential tool for running LLMs locally: consider it the Docker for AI models. It packages complex models and dependencies into a single, easy-to-use application for macOS, Linux, and Windows systems. You get immediate access to models like Gemma 2 and DeepSeek-R1 via a straightforward CLI or REST API. This local-first approach guarantees data privacy and security, eliminating cloud dependency and high API costs. Ollama also optimizes performance on consumer hardware using techniques like quantization, ensuring efficient execution even on standard desktops.

https://ollama.com

View projects