Open Sourcing Centaur: Multiplayer, self-hosted, secure agents

Centaur

Today we’re open sourcing Centaur, the self-hosted runtime from Paradigm and Tempo for multiplayer, secure AI agents. We have been using Centaur since January and it has transformed how we work across a wide spectrum of tasks including investing, engineering, design, recruiting, events, customer support and more.

Centaur is a shared agent that can use tools, run for hours or days, survive restarts, and operate with real credentials without ever seeing the raw secrets. You can talk to it in Slack or over an API. You can add a tool once and every agent can use it. You can drop in a workflow once and the whole organization gets that capability immediately. At the end of every day, it reflects on how it did and self improves.

Beyond open sourcing the code, and template repositories for extending and operating Centaur, we also deep dive on the architecture of the system, the interfaces between services, the security boundaries, and the execution model. Those are the pieces we think are worth copying, adapting, and reimplementing. Let’s dive in.

The problem with personal agents.

Most agent stacks are still built for one user on one machine. The moment you try to make a collaborative workflow, a different set of requirements appear:

The agent has to continue working even after you close your laptop (we all know that person who walks around with an open laptop because their agent is still running).
It has to be reachable where collaboration happens, e.g. in Slack.
It has to survive crashes, deploys, disconnects, and partial failures.
It must be safe enough to trust it with real systems without handing it raw API keys.
It has to be observable and auditable for security review and optimization.

Most systems solve parts of the above requirements. Some are good coding agents. Some are good browser agents. Some are good workflow engines. Few are designed as shared infrastructure for a team. And they all cost a lot, and lock you into a provider whose roadmap you’re downstream of.

We think that organizations should be empowered to own their stack, move at the speed they need and use AI in collaborative not isolated settings. This is the gap Centaur is built for.

How do I use Centaur?

Think of Centaur as a virtual employee. The Slack thread is the interface. You tag Centaur like you would any other employee, and Centaur replies. Depending on the task, Centaur will run for seconds, minutes or hours+, and will invoke a series of integrated SKILL.md’s or tools. It knows which Slack thread it’s summoned in, and reliably replies on that relevant topic and ask, instead of assembling random context from all over your knowledge bases.

Centaur out of the box can interact with spreadsheets, docs, slide decks, docsends, PDFs, and any other kind of file attachment you can think about. It can search Slack, the web, use Github, generate images and charts, create or update Google Docs, Slides and/or Sheets, interactive demos and more.

The Centaur codebase is still relatively young and we continue to work through enhancements and improvements. But it really is incredible to experience, and we think it's transformed how our organization functions. We strongly recommend creating a company-wide #ai-agent Slack channel where you invite everyone to join. In that channel it’s important that:

Leaders of the firm use it, and lead by example.
People are empowered to ask questions without fear of looking dumb.
People that are more AI-fluent nudge other people in existing threads on “Here’s how you could’ve used AI to get unblocked”.

How does Centaur work?

We tried to be thoughtful about two things, in particular, when designing Centaur:

One Slack thread = one isolated agent session. When you tag Centaur in a thread, the system assigns a dedicated sandbox container to that conversation. Inside the container, a full Linux environment runs the AI harness of your choice: Amp, Claude Code, Codex, or any CLI-based agent. The container has Node.js, Python, Rust, and git pre-installed, so the agent can git clone, cargo build, run tests, and write real code in a real environment.
Every session has access to organization-wide tools and skills. Agents love data, so if you give them connections and tools they’ll figure out how to do what you want them to do. A tool is a small Python class that wraps an API e.g. Slack, GitHub, Google Sheets. Drop the file in tools/, and every agent can call it immediately. Skills work the same way: a SKILL.md file that teaches the agent a workflow or set of instructions, available to every conversation the moment it's added.

Under the hood, Centaur is a service-based architecture where all state lives in Postgres and every service is stateless. This means the system survives restarts, deploys, and crashes without losing work. Here’s how every component talks to each other:

Here’s a deep dive of each of the services:

Slackbot: A thin Next.js webhook listener. When someone tags Centaur in Slack, the slackbot receives the event and calls the API's durable protocol: spawn a runtime, persist the message, and execute the turn.
API: The FastAPI control plane that orchestrates everything. It manages the lifecycle of agent sessions (spawn → message → execute), serves auto-generated REST endpoints for every tool plugin, runs the durable workflow engine, and streams execution events back to clients. The durable workflow engine is heavily inspired by Absurd, more below.
Postgres: The single source of truth. Thread assignments, execution state, workflow checkpoints, API keys, audit logs, everything durable lives here. Because every service is stateless, you can restart any service without losing context.
Sandbox: Each conversation gets its own sandboxed container on an internal-only network. The agent runs inside this container and calls back to the API for tool access over REST. Containers can be resource-limited and host filesystems are mounted read-only. A warm pool of pre-spawned containers eliminates cold-start latency.
Firewall: An iron-proxy pod sits between each sandbox and the outside world. The agent or the user never holds real API keys. Instead, the proxy intercepts all egress traffic and, given the traffic doesn’t violate any firewall rules, injects the correct credentials in-flight, matched to the target host and source tool. A request using the Linear tool to api.linear.app gets the Linear key, while using gsuite injects the Google OAuth credentials.
Observability: Every service writes structured JSON logs to stdout. We empower users to choose their own observability stack and write tools for Centaur to discover its own metrics, logs and traces. By default, Centaur ships with tools for VictoriaLogs/VictoriaMetrics.

Company specific overlays

Extensibility is a first class citizen in Centaur - each company (including Tempo and Paradigm) uses different tools, stores data in different sources and has company-specific knowledge that can be distilled into skills.

The framework supports “overlaying” - mounting a Docker image on top of the core Centaur services and providing the API/sandboxes access to tools, skills, and workflows specifically built for you.

Shared Skills

Skills are Markdown files (.agents/skills/*/SKILL.md) that teach the agent how to perform a specific task, e.g. a recruiting pipeline, a compliance check, a QA workflow. Add a skill file, and every agent session inherits that knowledge. This is already well established in teams but instead of having skills passed around, you just add your skill to Centaur’s .agent/skills directory and your whole team gets access to it.

Extensible Tools

Tools are the simplest extension point. A tool is a Python class in a directory with a client.py and a pyproject.toml. The API auto-discovers it on startup, generates REST endpoints at /tools/{name}/{method}, and hot-reloads on file changes. Here’s an example of a tool:

# tools/my-tool/client.py — this is the entire file
import httpx

class MyToolClient:
    def search(self, query: str, limit: int = 10) -> dict:
        """Search for something."""
        return httpx.get(f"https://api.example.com/search?q={query}&limit={limit}").json()

def _client():
    return MyToolClient()

Drop that file in tools/my-tool/, and within seconds every agent conversation in your organization can call it. The tool declares which API hosts and secret keys it needs in its pyproject.toml so that the firewall can handle credential injection.

Extensible Workflows

A workflow is a single Python file that exports a name and a handler function. Drop it in workflows/, and it's available via cron, triggerable via API, or composable with other workflows.

# workflows/daily_digest.py — drop this file in and it's live
WORKFLOW_NAME = "daily_digest"

async def handler(inp, ctx):
    data = await ctx.step("fetch", lambda: fetch_metrics())
    await ctx.sleep("wait", timedelta(hours=24))
    await ctx.run_agent("summarize", text=f"Summarize: {data}")

The workflow engine checkpoints every step to Postgres. If the process crashes mid-workflow, it resumes exactly where it left off, no duplicate work, no lost state. Sleeping for 24 hours between steps costs nothing; the workflow suspends and the engine wakes it up when it's time. For the observant reader, this is a Durable Workflow pattern that is increasing in popularity. This design was inspired by Absurd’s Postgres-driven architecture.

How secure is Centaur?

Most agent frameworks handle secrets the same way you would on your laptop: dump API keys into environment variables and hope the agent doesn't leak them. This works for personal use. It doesn't work when you're handing an agent credentials to your company's Slack, GitHub, cloud infrastructure, and financial systems.

Centaur takes a different approach: The agent never holds your secrets. Not in environment variables, not on disk, not in memory. Instead, credentials exist only inside an isolated secrets manager, and a network-level firewall injects them into outbound requests in-flight, after the request leaves the sandbox, before it hits the external API.

Here's what that looks like concretely:

A tool declares in its pyproject.toml that it needs, say, SLACK_BOT_TOKEN and talks to api.slack.com.
On sandbox startup, the Iron proxy builds a mapping: api.slack.com to SLACK_BOT_TOKEN.
When the agent calls Slack, the request passes through the firewall. The firewall sees the target host, looks up the correct secret from your secrets manager, and injects it into the request header.
The agent sees a successful response, but never saw the token.

This means a compromised agent or a prompt injection attack cannot exfiltrate your credentials. It can make authenticated requests through the proxy (it has to, that's how it works), but it cannot extract the raw key values, send them to a different host, or smuggle them out in a response.

This security is enforced through network policies, such that no container can access:

The secret service directly
The web without first going through the firewall.

This means that all secrets are protected, and every request on the way out and back in goes through the firewall. In addition to that, every outbound request from every sandbox is logged by the firewall and response bodies from LLM APIs are scanned for leaked secret values and redacted in real-time. This level of observability enables us to detect leaks and malfunctions quickly and fix them even quicker, letting Centaur be its own AI SRE.

What’s next for Centaur?

Centaur's architecture deliberately separates a small, auditable core from a wide-open extension surface:

Kernel: The core (the API, the firewall, the secrets manager)
Userspace: Tools, workflows, and skills.

This separation is what let us recently introduce self-improvement via nightly reflection: The agent reviews its own performance, identifies gaps, and ships fixes to its own skills and tools without touching the kernel. We can let the system evolve itself because the blast radius is structurally bounded.

Today's release is the kernel we've been running in production at Paradigm and Tempo since January. Next up is making the userspace more powerful. We think workflows will evolve into full application containers, i.e. long-running services that leverage the rest of the system for tool access, secrets, and observability, but run their own logic.

You can extend Centaur’s userspace with your own tools and workflows without having to fork the repository. See our docs here.

Centaur is open source under Apache 2.0. It has transformed how we work, and we hope it does the same for you. Get started: https://centaur.run.

If you're a cracked AI engineer and want to help maintain and evolve Centaur in the open or work on Paradigm's proprietary AI tooling, reach out to georgios@paradigm.xyz.